GA-CCRi Analytical Development Services

Detecting Anomalous Vessels at Maritime Ports

Every day, hundreds of thousands of vessels cross the oceans, traveling from country to country and port to port. Most of these trips are mundane, routine, and fairly uninteresting. These vessels are carrying cargo and have declared their port calls beforehand.

Instances where vessels don’t declare a port visit are less common, but it can be valuable to know about them as soon as possible. The arrival of unusual vessels is important information for port authorities and port operators due to the abnormality of the vessels’ type, flag country, speed, size, or other attribute. Real-time alerting of all vessel arrivals, with particular attention to vessels that are abnormal or anomalous from the receiving port’s perspective, can be used to quickly cue port authorities to further inspect the vessel.

Defining an abnormal vessel visit can be tricky. When it comes to machine learning algorithms, finding training data for such events is even more challenging. GA-CCRi has created a port-based anomaly detection model and demonstrated how it can be used to score incoming vessels for potential anomalies.

One example of such potentially anomalous vessels was a group of Iranian oil tankers that sailed to Venezuela. This was an unusual journey that sparked global interest. The image below shows historical Automatic Information System (AIS) vessel position data collected and provided by exactEarth.

Historical AIS data for Iranian tankers crossing the Atlantic en route to Venezuela
Historical AIS data for Iranian tankers crossing the Atlantic en route to Venezuela

Vessel and Port Descriptions

A challenging factor for tracking port arrivals, departures, and current port activity is deciding what geographical boundaries to use as a bounding area. GA-CCRi recently launched a new data product that includes data-driven and detected port boundaries based on real-time and historical vessel positions. The image below illustrates distinct port boundaries created through this process. It also shows live vessel AIS data for slow-moving or stopped vessels that are moored and how they reside in these port boundaries. The port boundaries will be the basis for creating subsequent anomaly models.

Example port boundaries with live AIS
Example port boundaries with live AIS

Detecting Abnormal Vessels

Training Data

Looking at historical AIS within Venezuela’s Port El Palito shows three of the five tankers visiting this port. These three tankers, with their Maritime Mobile Service Identity (MMSI) numbers, are the Fortune (422303000), Petunia (422232600), and Clavel (422232300). Shown below is Port El Palito with its corresponding port boundary and  tracks of AIS-reported positions, colored by vessel, of the tankers arriving at the port.

Port boundary for Port El Palito along with AIS showing the arrival of the tankers
Port boundary for Port El Palito along with AIS showing the arrival of the tankers

Using this port geometry, we collected historical AIS data over the time period of 1 May 2019 – 1 May 2020 for vessels that were either moving slowly or stopped. This resulted in a data set containing 54 unique vessels from 12 different countries covering 6 vessel types. To construct the training set, we used a subset of numerical and categorical fields that relate to the physical characteristics: vessel type (categorical), flag country (categorical), length (numerical), and width (numerical).

Model Training

Once we created useful training data, we trained an isolation forest model using flag country, vessel type, length, and width as input features. At first, we used the isolation forest model from the scikit-learn library, but we found issues caused by the inclusion of categorical features. In an attempt to use the categorical features correctly, we used one-hot encoding to encode the flag countries and vessel types, but the algorithm failed to use the features correctly, resulting in strange results. Further research into this issue led us to the h4o.ai library, which handles categorical features with no need to encode them beforehand. This indeed appeared to be the case as the results aligned much better with what we expected.

Model Results

The isolation forest is a tree-based model capable of explicitly identifying anomalies instead of relying on a clear distinction of “normal” instances.  The algorithm takes advantage of the properties of an anomaly in that there are fewer instances and the values of the attributes are very different from normal, more abundant instances.

As mentioned above, the isolation forest does not rely on training data to be without anomalies, which provides an advantage when training data is hard to come by. During model training, random partitions are created within each tree by randomly selecting a feature and value to perform a split, where the split depends on how long it takes to separate the instances. For each tree, a path length parameter is found that represents the number of edges an observation must pass in the tree before reaching the terminal node. Shorter paths are produced for anomalous instances with random partitioning, and when the average path length is shorter over many trees, it is highly likely that the instance is anomalous. The anomaly score is inversely related to the average path length, so a larger score indicates a more anomalous observation.

The following table shows the unique vessels that are believed to have visited Port El Palito along with the average path length and anomaly values assigned by the trained model. The vessels we are interested in have higher anomaly scores, with the two Iranian vessels of identical size having the same score. Some other vessels with higher scores include a Venezuelan diving vessel, which scores high due to its type. The most anomalous vessel was a Cuban tanker, which was likely due to its flag country and its physical size being smaller than a typical tanker.

MMSI
Flag Country
Vessel Type
Length
(Meters)
Width
(Meters)
Isolation Forest
Anomaly Score
Isolation Forest
Path Length
323147000 Cuba Tanker 144 23 0.75 3.19
422303000 Iran Tanker 175 31 0.69 3.38
636016905 Liberia Tanker 240 36 0.63 3.58
775996002 Venezuela Diving 24 6 0.62 3.64
422232300 Iran Tanker 183 32 0.60 3.70
422232600 Iran Tanker 183 32 0.60 3.70
620586000 Comoros Tanker 185 32 0.58 3.75
775994450 Venezuela Tug 28 10 0.58 3.75
355823000 Panama Tug 30 6 0.48 4.08
371182000 Panama Tug 29 10 0.45 4.18
775054000 Venezuela Tanker 282 32 0.42 4.29
775090000 Venezuela Tanker 228 42 0.26 4.80
775092000 Venezuela Tanker 183 32 0.25 4.86
775048000 Venezuela Tanker 183 32 0.25 4.86
352332000 Panama Tanker 251 46 0.23 4.92
371382000 Panama Tanker 246 42 0.12 5.26
370955000 Panama Tanker 228 32 0.05 5.50
373527000 Panama Tanker 183 32 0.00 5.67

Scored vessels for Port El Palito over the time period for which the Iranian tankers visited

Port Anomaly Detection at Scale

This technique could be extended to any number of ports that have a port boundary available, where each model can be used to score real-time AIS observations as they are collected and transmitted. Real-time anomaly scores could be made available alongside the live AIS with a small latency, providing near real-time anomaly detection at a global scale. Early warnings could also be generated by scoring vessels that are inbound to ports using their forecasted routes, which could allow port operators, ship owners, and customs control to anticipate unusual vessel arrivals before they occur. Next steps for this work would be to model global ports and then use our streaming detection platform to score all incoming AIS messages against each model, creating real-time anomaly scores.

Go Back