Next Article in Journal
First Data on the Age and Growth of Schmidt’s cod Lepidion schmidti (Moridae) from Waters of the Emperor Seamounts (Northwestern Pacific)
Next Article in Special Issue
An Experimental Study on Trajectory Tracking Control of Torpedo-like AUVs Using Coupled Error Dynamics
Previous Article in Journal
Strength Assessment of Cement-Based Materials under Marine Conditions Subjected to Sulfate and Chloride Attack Based on Ion Distributions
Previous Article in Special Issue
Optimized APF-ACO Algorithm for Ship Collision Avoidance and Path Planning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Long-Term Trajectory Prediction for Oil Tankers via Grid-Based Clustering

1
School of Communication Engineering, Hangzhou Dianzi University, Hangzhou 310018, China
2
State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361005, China
3
School of Electronic and Information Engineering, Anhui University, Hefei 230039, China
*
Authors to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2023, 11(6), 1211; https://doi.org/10.3390/jmse11061211
Submission received: 19 May 2023 / Revised: 7 June 2023 / Accepted: 8 June 2023 / Published: 11 June 2023
(This article belongs to the Special Issue Motion Control and Path Planning of Marine Vehicles)

Abstract

:
Vessel trajectory prediction is an important step in route planning, which could help improve the efficiency of maritime transportation. In this article, a high-accuracy long-term trajectory prediction algorithm is proposed for oil tankers. The proposed algorithm extracts a set of waymark points that are representative of the key traveling patterns in an area of interest by applying DBSCAN clustering to historical AIS data. A novel path-finding algorithm is then developed to sequentially identify a subset of waymark points, from which the predicted trajectory to a fixed destination is produced. The proposed algorithm is tested using real data offered by the Danish Maritime Authority. Numerical results demonstrate that the proposed algorithm outperforms state-of-the-art vessel trajectory prediction algorithms and is able to make high-accuracy long-term trajectory predictions.

1. Introduction

About 90% of global trade is carried by maritime transportation [1]. With the continuing growth of international trade, modern maritime transportation calls for more intelligent methods for transportation management to achieve larger capacity, faster traveling speed and higher safety levels. To achieve these goals, accurate predictions of vessels’ future movement is important and can be used in many maritime applications, such as port management, anomaly detection and collision avoidance [2].
Despite the importance of vessel trajectory prediction, it remains a challenging task due to the diverse navigation environments and the stochastic nature of vessel movements. While most of the existing works on vessel trajectory prediction have focused on making short-term predictions [3,4,5,6,7], being particularly useful for collision warning and avoidance, this work investigates the problem of long-term vessel trajectory prediction. Long-term trajectory predictions can be used to guide the captains to operate the ship in a more fuel-efficient way and hence reduce the carbon dioxide emissions, with assistance also from additional information, including weather and current forecasts on the route. Long-term vessel trajectory predictions can also be used by the agencies of port management to obtain more accurate estimates of the remaining navigation distance and subsequently obtain more accurate predictions for vessel arrivals.
The prevalence of maritime transportation data makes it possible to take a data-driven approach to develop high-accuracy vessel trajectory algorithms [2,8]. For instance, the Automatic Identification System (AIS), a global autonomous tracking system that has been made compulsory for ships exceeding 300 tons, provides abundant and near real-time information about ships. Apart from static information such as MMSI, ship name and ship type, the AIS messages also contain the location information of the ship (longitude (Lon) and latitude (Lat)) and information about its traveling (e.g., speed over ground (SOG), course over ground (COG) and heading) [9]. This information, when gathered in large scale, can be used to characterize the voyage patterns in a corresponding area and can further be exploited to predict vessel trajectory.
This paper presents an AIS data based long-term vessel trajectory prediction algorithm, aiming to predict the trajectory of oil tankers from any location to a pre-defined destination, e.g., a port. The proposed algorithm takes a data-driven approach. First, the traveling patterns of tankers in an area of interest are extracted from the historical AIS data via key point clustering. The output of clustering is a set of waymark points, with each corresponding to a cluster. Each waymark point is characterized by the average Lon, Lat, COG and the number of key points in the cluster that it represents. In real-time trajectory prediction, the algorithm uses the Lon and Lat to filter the waymark points and uses the information of COG and the number of points to calculate a weighted distance between the filtered waymark points and the current reference point. A segment of the predicted trajectory is generated by connecting the reference point to the waymark point that has the smallest weighted distance. The complete long-term trajectory prediction is made by repeating this process until the reference point reaches the destination. For the proposed algorithm, the key point clustering only needs to be perform once. Once the set of waymark points is obtained, all subsequent oil tankers arriving at the same destination can use the obtained set of waymark points to achieve trajectory prediction. We verify the effectiveness of the proposed algorithm using real AIS data provided by the Danish Maritime Authority (DMA) [10].
The rest of this paper is organized as follows. Section 2 provides a brief review of related work in the area of trajectory prediction. Section 3 describes the proposed trajectory prediction algorithm. In Section 4, the proposed algorithm is applied to real AIS data and is compared to state-of-the-art algorithms. Finally, Section 5 draws conclusions and discusses possible future research directions. For ease of exposition, Table 1 lists the key notation used in this paper.

2. Related Work

With the advances in high-precision positioning technologies such as the Global Positioning Systems and real-time radars, it is possible to acquire accurate information about positions for vehicles, aircraft, ships and pedestrians. The availability of high-accuracy positioning data, together with other information such as speed and acceleration, makes it possible to predict the trajectories of the targets of interest automatically, which could find applications in many areas, including terrestrial navigation [11,12], autonomous driving [13,14], and maritime traffic management [2,8,15,16]. This paper focuses on the trajectory prediction problem for vessels, for which the existing works can be classified into two categories, i.e., short-term trajectory prediction [3,4,5,6,7] and long-term trajectory prediction [17,18].
For short-term vessel trajectory prediction, the time horizon over which the predictions are made is usually from a few seconds to a few tens of minutes. Such vessel prediction algorithms are often developed for collision avoidance to ensure navigation safety. Hence, apart from the prediction accuracy, timeliness is also important [19]. In this category, classical algorithms follow from mathematical models of mobility and statistical techniques for accurate trajectory predictions. Examples of such methods include the Kalman filter-based algorithm [20], the Gaussian process-based algorithm [3], and the Markov process-based algorithm [21]. However, due to the complicated nature of vessel trajectories, these methods may not be able to capture the characteristics of the trajectories to predict and thus fail to provide accurate predictions.
The recent advances in machine learning techniques, along with the assistance of abundant AIS data, have stimulated a burst of research on machine-learning-based vessel trajectory algorithms [5,6,7]. For instance, Zhang et al. [5] used a hybrid method based on LSTM and KNN for short-term trajectory prediction. The algorithm switches between LSTM and KNN based on the trajectory densities of different areas. That is, in dense areas, KNN is used for trajectory prediction, while in sparse areas, LSTM is adopted. Using publicly available AIS data collected near Xiamen Port, Fujian Province from 2018 to 2019 [5], the results show that this method has better prediction accuracy compared with classical prediction algorithms developed based on Kalman filtering. You et al. [6] proposed a sequence-to-sequence model based on GRU to predict vessel trajectory at a time horizon of 5 min. The model works by first encoding the trajectory as a context vector to maintain the temporal relationship of the trajectory position, and then using GRU as a decoder to output the future trajectory. Numerical results based on the data collected near Chongqing and Wuhan of Yangtze River Channel demonstrate good short-term trajectory predictions. Murray et al. [7] proposed a data-driven trajectory prediction method which first applies Principal Component Analysis (PCA) to each trajectory to generate feature vectors and then uses the extracted feature vectors as inputs for the Gaussian Mixer model to predict multiple possible trajectories at the same time. The time horizon of prediction is about 30 min. All of the above work uses deep learning or data-driven methods and achieves better results.
Compared to the problem of short-term trajectory prediction, the progress on the long-term trajectory prediction problem is much less reported, partly due to the more challenging nature of the long-term prediction problem. Existing short-term prediction algorithms may be used to produce long-term predictions through recursions; however, the accuracies are expected to decrease as the number of prediction steps increases [6]. One relevant work that tackles the problem of long-term trajectory can be found in [22], where DBSCAN was used to cluster historical trajectories, and the trajectory predictions are produced by pre-trained deep learning models. This work was tested using the publicly available AIS data from MarineCadastre in Zones 15 and 16. While [22] also adopted DBSCAN, the overall methodology is completely different from our proposed approach. For instance, in [22], DBSCAN was used to cluster trajectories based on trajectory statistics such as the trimmed mean of the longitude (latitude). In contrast, our work adopts DBSCAN to cluster the key points from many historical trajectories, where key points sharing similar characteristics are grouped into the same cluster. Two other relevant works on long-term vessel trajectory predictions can be found in [17,18]. Both these two works attempt to predict the remaining path of ships to a destination in order to better estimate the remaining traveling distance and to achieve a more accurate estimate of the arrival times. Ogura et al. [17] evaluated the difference between the predicted weather and the weather of historical trajectories, and then adopts the historical trajectory with the smallest difference as the prediction. The AIS data of four cargos traveling to and from Japan was used to test the weather-based algorithm [17]. Alessandrini et al. [18] adopted a graph-based approach by calculating the raster of density and directionality of all the historical data and applying a path-finding algorithm to identify a path from the current location to the destination. This method was tested using AIS data recorded in October 2015 near the port of Trieste, an Italian city on the Northeastern Adriatic Sea, with the assistance of historical LRIT data from 2009 to 2014 [18].

3. Methodology

In this section, we describe the proposed long-term vessel trajectory prediction algorithm. Figure 1 presents a flow chart of the proposed algorithm. As shown in the figure, the algorithm has the following two modules: the key point clustering module and the trajectory finding module. In the remaining part of this section, we present the details of the two modules.

3.1. Key Point Clustering

The key point clustering module aims to extract from pre-processed training trajectories a set of waymark points that are representative of the traveling patterns of the oil tankers in an area of interest. Instead of keeping all the raw AIS data points in the training trajectories, each trajectory is pre-processed to remove redundant data and to reduce the number of points retained in the trajectory. (A detail description of data pre-processing will be presented in Section 4.1 along with the experiments.) The points retained in the training trajectories are called the key points. The key point clustering module treats all key points (from all different training trajectories) as the input and produces a set of waymark points. As a result of clustering, each waymark point not only reflects the Lon and Lat of tankers passing through the vicinity of the corresponding cluster, but also contains information such as the COG to better capture the motion of tankers in the corresponding area.
Each predicted trajectory consists of a number of waymark points. Hence, to achieve accurate predictions, there should be a sufficient number of waymark points that can comprehensively capture all possible routes in the area of interest. Additionally, waymark points that are close in space should have distinguishing features related to the motion of the vessels to make clear selections among the waymark points. To fulfill these purposes, we adopt a COG-based clustering process to generate the waymark points [23].
To facilitate the description of the clustering process, we use λ k , ϕ k to represent the Lon and Lat of the key point p ˜ k , respectively, and  θ k the COG of p ˜ k . Hence, each key point is represented by p ˜ k = λ k , ϕ k , θ k . Let D = p ˜ 1 , p ˜ 2 , p ˜ 3 , , p ˜ K be the set of key points in the training trajectories. The core idea of the clustering algorithm is first to discretize the entire area of interest into a set of grids and then run DBSCAN for each grid to cluster the key points falling into the corresponding area. Specifically, DBSCAN is used to cluster the key points within the grid area based only on the COG to extract representative traveling directions. Each cluster identified by DBSCAN forms a waymark point q m , which contains four-dimensional information, including the average Lon λ ¯ m and average Lat ϕ ¯ m of all the key points in the corresponding cluster, the median COG of the key points, i.e.,  θ ¯ m , and the number of key points ρ ¯ m in the cluster, i.e.,  q m = λ ¯ m , ϕ ¯ m , θ ¯ m , ρ ¯ m .
Each of the grid areas can be represented as:
G i , j = { λ , ϕ | λ λ ¯ i δ λ , λ ¯ i + δ λ , ϕ ϕ ¯ j δ ϕ , ϕ ¯ j + δ ϕ } ,
where λ ¯ i and ϕ ¯ j are the Lon and Lat of the center of the grid, and δ λ and δ ϕ are the widths in the longitude and latitude direction. The difference of the center Lon (Lat) between two adjacent grid areas can be made smaller than 2 δ ϕ ( 2 δ λ ), e.g.,  λ ¯ i λ ¯ i + 1 < 2 δ ϕ . In such cases, two adjacent grid areas may partially overlap with each other, and the key points falling into the overlapped areas are used multiple times during clustering. Such a setup makes it easier to obtain a representative set of waymark points from non-uniformly distributed key points by using a relatively large grid width.
It is noted that a customized distance metric is adopted when applying DBSCAN for COG clustering. Specifically, suppose θ i [ 0 ° , 360 ° ) and θ j [ 0 ° , 360 ° ) are two COGs. Then, the corresponding DBSCAN distance metric is calculated as | θ i θ j | if | θ i θ j | < 180 ° and 360 | θ i θ j | otherwise. This customized metric is to avoid the errors caused by COGs close to the north, i.e., COG values close to 0 ° or 360 ° . Additionally, it is noted that the Lon and Lat are not used in the clustering process. This is due to the fact that the COG clustering is performed for each grid area, where the key points have relatively close values of Lon and Lat.
As an illustrative example, Figure 2 presents the key points obtained from the trajectories of the oil tankers passing through the shown area between 2017 to 2019. It can be seen that there are areas where the key points are sparse, while there are also areas where the key points are much denser. It is known that DBSCAN requires one to specify a value on the minimum number of points in a cluster so the algorithm can operate. If there are less key points than this value, then all points in the grid area will be classified as noise, and no cluster can be formed. Therefore, in a sparse area, the width of the grid area has to be set sufficiently large. However, if overlapping is not permitted, then a large width will lead to a reduced number of waymark points, which will make the representation of the traveling patterns less accurate.

3.2. Trajectory Finding Based on Waymark Points

Without loss of generality, suppose a prediction of the future trajectory is made at time instant 0, where the historical trajectory consisted of the set of previous location points { p I , p 2 , p 1 , p 0 } , and p n = ( λ n , ϕ n ) contains the Lat and Lon of the n-th position. Denote Q = { q 1 , q 2 , , q M } as the set of all waymark points obtained from the key point clustering. The task of the trajectory prediction is to select from Q a set of N points to produce the predicted trajectory T = { p 1 , , p n , , p N , p D E S } , where p D E S is the destination of interest. This process is implemented in an iterative way by sequentially identifying the N waymark points. Note that the number of point N is determined by the algorithm automatically and is not a preset value.
In the n-th iteration, the algorithm first identifies a candidate waymark point set Q n Q based on T n = { p I , p 0 , , p n } . The candidate set Q n is then filtered to generate a refined candidate set Q n , from which the predicted point p n + 1 is chosen based on a weighted distance to be explained later in this section. In this process, the Lat, Lon and the information of direction and point density are all utilized.
For ease of exposition, we introduce the following notation:
d i r e ( p m , p n ) : the heading direction from position p m to position p n ;
d i s t ( p m , p n ) : the spherical distance between p m to position p n , calculated using the Python package Geopy;
L D M { a 1 , a 2 , , a m } : the linear directional mean of direction a 1 , a 2 , , a m , calculated as:
L D M { a 1 , a 2 , , a m } = a 1 + a 2 + . . . + a m m ,
where a i , i = 1 , , m is a unit vector from the origin.
Δ a , b : the angular change from direction a to direction b , measured in degrees. Δ a , b is calculated as:
Δ a , b = 360 ° Δ Δ 180 ° Δ Δ 180 °
where Δ = | a b | . Here, angle a , b both take value in [ 0 ° , 360 ° ) , where 0 ° corresponds to the north.
At the n-th iteration, the distance from p n to p D E S is compared against a pre-determined threshold ε D E S . If 
d i s t ( p n , p D E S ) < ε D E S ,
i.e, the distance to the destination is sufficiently small, then the iterative process terminates, and p D E S is appended to T n to form the completed trajectory prediction. Otherwise, the algorithm continues to identify p n + 1 . The threshold ε D E S needs to be chosen with care, as setting it too large can lead to premature terminations of the trajectory finding, while setting it too small may make the prediction susceptible to interference from other routes in the areas of high concentrations of multiple routes towards different directions, e.g., the areas near a port. We recommend to set ε D E S to be the distance from the destination from which various routes begin to diverge.

3.2.1. Candidate Set Q n Identification and Navigation Status Update

At iteration n 1 , the algorithm first identifies the candidate set Q n , which is made up of all the waymark points within a rectangle centered at the latest point in the trajectory, i.e., around point p n = ( λ ¯ n , ϕ ¯ n ) :
Q n = { ( λ ¯ m , ϕ ¯ m , θ ¯ m , ρ ¯ m ) | λ m ( λ ¯ n ε λ , λ ¯ n + ε λ ) , ϕ m ( ϕ ¯ n ε ϕ , ϕ ¯ n + ε ϕ ) }
where ε λ > 0 and ε ϕ > 0 are used to define the size of the rectangular area.
To ensure that the waymark point p n + 1 locates in the direction that is consistent with the recent trajectory trend, we update the current heading trend d c u r n as:
d c u r n = d n n = 1 L D M d c u r n 1 , d n , d Q n 1 n > 1
where d n = d i r e ( p n 1 , p n ) and d Q n 1 is the mean of the COGs of the waymark points in set Q n 1 , and Q n 1 is the filtered candidate set in the ( n 1 ) -th iteration (see Section 3.2.2 for details). From the above equation, it can be seen that when n > 1 , d c u r n is the linear directional mean of the (predicted) traveling direction of iteration n 1 , the heading direction d n from point p n 1 to p n (the current position), and the mean COG of the key points in Q n 1 .

3.2.2. Candidate Set Filtering

In this step, the waymark points in Q n that deviate significantly from the expected traveling direction are filtered out from further consideration in the current prediction iteration. To do so, we first denote d D E S n d i r e ( p n , p D E S ) as the direction from the current position to the destination and then introduce the following angle-dependent parameters:
δ m = Δ d p n , q m , d D E S n
δ m = Δ θ n , θ ¯ m
where d p n , q m = d i r e p n , q m is the heading direction from p n to q m . Parameter δ m measures the degree of difference between d p n , q m and the direction of the current position towards the destination, while δ m measures the degree of difference between the COG of p n and the COG of q m . An illustration of the above notation is given in Figure 3.
Ideally, if a tanker does not make a sharp turn, δ m is expected to take small values. Additionally, as the tanker is traveling towards the destination, it is expected that the heading from p n to q m does not deviate too drastically from d D E S n . Therefore, it is expected that δ m does not take extremely large values. Therefore, we identify the refined candidate set by applying filtering based on δ m and δ m as follows:
Q n = q m | q m Q n , δ m ϵ m , δ m ϵ m ,
where ϵ m and ϵ m are two thresholds.

3.2.3. Waymark Point Selection

Once the filtered candidate set of waymark points, Q n , has been identified, the algorithm can proceed to make a selection for the next waymark point. As each waymark point q m = λ ¯ m , ϕ ¯ m , θ ¯ m , ρ ¯ m represents a cluster of key points, a higher value of ρ ¯ m means that the corresponding grid area has been visited more frequently. Hence, it is reasonable to allocate more weight to such waymark points in the selection. Additionally, the expected trajectory trend, i.e.,  d c u r n , is expected to be close to the heading from p n to q m . Based on these considerations, we propose a weighted distance to account for both the frequency that a waymark point is visited and the angular change from p n to q m . Specifically, the weighted distance between the current point p n and a candidate waymark point q m Q n can be calculated as:
D i s p n , q m = κ ρ m a x / ρ ¯ m + 1 κ Δ d c u r n , d p n , q m
where Δ d c u r n , d p n , q m is the angular change defined in (3), and κ ( 0 , 1 ) is the weight. Usually, Δ d c u r n , d p n , q m is much larger than ρ m a x / ρ ¯ m ; thus, κ can be set close to 1 to better balance the two terms. Parameter ρ m a x = max { ρ ¯ m | q m Q n } is the maximum density of all the waymark points in the refined candidate set Q n . Based on the weighted distance, the next waymark point is then selected as the one with the smallest weighted distance, i.e.,
p n + 1 = arg min q m Q n D i s ( p n , q m ) .
The weighted distance defined in (9) does not consider the geo-distance between two waymark points. This is due to the fact that the refined candidate set Q n Q n contains waymark points that are sufficiently close to the current reference point, as shown in (4). In this case, it is sufficient to focus on the angular and density differences to identify the next waymark point. The proposed trajectory prediction algorithm is finally summarized in Algorithm 1:
Algorithm 1 Trajectory prediction algorithm
Require:  p D E S , T 0 = { , p 1 , p 0 } , Q = { q 1 , q 2 , , q M }
Parameters:  ε D E S , ε λ , ε ϕ , ε δ k , ε δ k , κ , ϵ m , ϵ m
Initialization:  T = { p 0 }
   while  d i s t ( p n , p D E S ) ε D E S  do
      Q n = { λ ¯ m , ϕ ¯ m , θ ¯ m , ρ ¯ m | λ ¯ m λ ¯ n ε λ , λ ¯ n + ε λ ,
        ϕ ¯ m ϕ ¯ n ε ϕ , ϕ ¯ n + ε ϕ }
      Q n = q m | q m Q n , δ m ϵ m , δ m ϵ m
     for all  q m in Q n  do
        D i s p n , q m = κ ρ m a x / ρ ¯ m + ( 1 κ ) Δ d c u r n , d p n , q m
     end for
      p n + 1 = arg min q m Q n D i s ( p n , q m )
     Append   p n + 1  to T
  end while
  return T

4. Experimental Results

In this section, we present the experimental results that evaluate the performance of the proposed trajectory prediction method. We start this section by introducing the dataset and the pre-processing techniques used to process the raw AIS data. The results obtained from key point clustering are then presented to illustrate the operation of the proposed method, followed by the evaluations of trajectory prediction accuracies.

4.1. Dataset and Pre-Processing

In the experiment, we evaluate the proposed trajectory prediction method using the historical AIS data provided by the Danish Maritime Authority (DMA) [10]. The raw data contain AIS messages from various types of vessels traveling around Danish waters. The AIS data have 26 fields, including latitude, longitude, COG, destination, etc. In these experiments, we will mainly use latitude, longitude and COG fields for trajectory prediction. A comprehensive list and description of other fields can be found in Appendix A of [23]. The experiment uses the AIS data from oil tankers traveling to Port Skagen during the period between 2017 and 2019. This dataset contains the AIS data for 1278 vessels and has over 40 million AIS records. In this dataset, the data between 1 January 2017 and 30 June 2019 are used to extract the training trajectories and produce the waymark points. The AIS data from oil tankers traveling to the same port from 1 July 2019 to 31 December 2019 are used to test the proposed trajectory prediction algorithm.
The raw data are pre-processed before being used to produce the waymark points. Key steps of the pre-processing include trajectory extraction and integrity check and downsampling. We refer to Appendix B of [23] for a detailed description of the pre-processing steps. In the experiments, a total of 1046 training trajectories were obtained after the pre-processing, and there are 232 trajectories in the test period. Each trajectory undergoes a route to the destination, i.e., Port Skagen.

4.2. Results for Grid-Based Key Point Clustering

In the training dataset, there are over 550,000 key points. During the key point clustering, the difference in latitude/longitude between the center of the adjacent grid areas is set to 0.05 degrees, while the width of the grid area is 0.05 degrees for both the latitude and the longitude. The parameters for DBSCAN are listed in Table 2.
Figure 4 plots all the key points from the training trajectories, from which the results show that there are some relatively clear routes in the eastern part of the area. In contrast, the routes are much less clear in the western part of the area. After DBSCAN clustering, a total of 6127 waymark points are obtained, which are plotted in Figure 5. Comparing Figure 4 to Figure 5, the results show that although the number of waymark points is only about 1.1% of the key points, they capture all the main route patterns in both the western and the eastern part of the area.
Figure 6 shows the empirical cumulative distribution (CDF) of the COG standard deviation (STD) before and after clustering. Before applying the clustering algorithm, each COG STD is estimated based on the key points within the same grid area. After the clustering, the COG STD is for the key points within the same cluster. Therefore, as expected and shown in Figure 6, the clustering significantly reduces the STD of the COG. For example, before applying the COG clustering, the 95% STD value is about 67 . 8 ° . This means that for about 5% of the clusters (before applying the COG clustering), the COG spread measured by the STD is ≥67.8 ° . This result suggests that although the grid areas are relatively small, the traveling directions of the tankers within the same small area can differ significantly. In contrast, after applying the clustering, the 95% STD of the COG reduces to 2 . 2 ° . In other words, after clustering, the traveling direction of the cluster becomes more consistent, which is more indicative for the prediction of future positions of the vessel of interest.

4.3. Results for Trajectory Prediction

We now evaluate the performance of the proposed trajectory prediction. The parameters for waymark point connection are set as follows. The distance threshold ε D E S = 20 km, the length and the width of the rectangular region used to identify Q n were set to ε λ = 0 . 2 ° , ε φ = 0 . 2 ° , and the weight coefficient of the weighted distance is set to κ = 10 / 11 . In the filtering waypoint step, we have chosen two more lenient values to filter out impossible points, which are ϵ m = 180 ° and ϵ m = 120 ° .
Figure 7 shows the number of test samples versus the prediction horizon measured by the number of hours. It can be seen that the test samples contain a mixed of short-term and long-term trajectory prediction tests. About 53.5% of the tests correspond to trajectory predictions in 10 h or more.
To test the proposed algorithm, we implemented a trajectory prediction algorithm based on graph theory and the Dijkstra algorithm [18] and use it as the baseline. Additionally, we also implemented a straightforward trajectory prediction algorithm that simply selects the training trajectory that most matches the historical segment of the trajectory under prediction. This baseline is termed best-matched. Directed Hausdorff distance is adopted to measure the similarity. Figure 8 presents four examples of the trajectory prediction results of the proposed algorithm (the red curves), the graph-based baseline from [18] (the greed curves), and the baseline of the best matched (the orange curves). The ground truth trajectory is also plotted as a reference in each example (the grey solid curves). In the figures, the grey dotted curves are the historical segment of the trajectory prior to the prediction moment. The results show that while the baseline algorithms can provide trajectory predictions fairly close to the ground truth, the smoothness of the predicted trajectories is not good enough. As a comparison, the proposed algorithm can offer more accurate predictions in the sense that the predicted trajectories are visually closer to the true ones and retain the smoothness of realistic trajectories.
From a quantitative perspective, we further evaluate the similarities between the predicated and the ground truth trajectories for the proposed algorithm and the baseline algorithms. To measure the similarity, we consider the DTW distance [24,25,26,27] and the Hausdorff distance [28], which are commonly used to measure the similarities of two curves/sequences of data. A smaller value of this metric suggests a higher degree of similarity and thus a more accurate trajectory prediction.
As the area of interest contains different routes to Port Skagen from different directions, to better demonstrate the details of the performance evaluation, we classified the trajectories into eight trajectory clusters, as shown in Figure 9, for the tested data. Table 3 shows the number of test instances for each cluster in the experiment and the corresponding average DTW and HD. Each test instance refers to a time instance of a trajectory in the test dataset. Since each track lasts more than 25 h on average, the number of test instances is significantly larger than the number of trajectories.
From Table 3, it can be seen that the proposed algorithm outperforms the baseline algorithms on both the performance metrics and for all the eight trajectory classes. Specifically, there is a 12.08% and 25.18% improvement in the DTW distance compared to the best-history and the graph-based baselines, respectively. For Hausdorff distance, the improvement is 11.05% and 42.04% over the two baselines. It can also be seen that for trajectory clusters 1, 6 and 7, which locate on the eastern side of the sea, the advantages of the proposed algorithm over the graph-based approach are even more significant due to the complexity of the shipping lanes and the extremely narrow passable channels.
To further show the stability of the performance of the algorithms, box plots of the performance metrics of the three algorithms are presented in Figure 10 and Figure 11, where the orange line indicates the median, and the green triangle is the location of the mean. It can be seen that for both metrics, the proposed algorithm has a lower median and 75% quantile of the mean than the baseline algorithm, demonstrating higher robustness.

5. Conclusions and Discussion

In this paper, a long-term trajectory prediction algorithm has been developed for oil tankers traveling to a known destination port. The proposed algorithm utilizes key points from historical training trajectories that are extracted from historical AIS data in an area of interest. The algorithm then applies the DBSCAN clustering algorithm to generate a set of waymark points that are much fewer than the key points but still retain all the traveling patterns of the oil tankers. Based on the waymark points, a novel path-finding algorithm has been developed that sequentially identifies a set of waymark points to form the predicted trajectory. The proposed algorithm was tested on real AIS data for oil tankers in Danish waters with a fixed destination of port Skagen. Experimental results show that compared to existing trajectory prediction algorithms, including the graph-based and the best-matched approach, the proposed method can achieve more accurate trajectory predictions with higher trajectory smoothness. Specifically, measured by DTW distance and Hausdorff distance, the proposed method offers a reduction of 25.18% and 42.04% over the graph-based baseline and 12.08% and 11.05% over the best-matched baseline, respectively.
While the proposed algorithm has been tested using real AIS data, the tested scenario is relatively limited. Further testings with AIS data from different areas, on different types of vessels, and to different destinations are needed to obtain a more comprehensive evaluation of the performance of the algorithm. Additionally, the focus of the current work was only on the prediction of the trajectory path. Further effort is needed to match the positions of the predicted trajectory to timestamps. In this direction, a combination of the current algorithm with machine learning techniques may be needed to better capture the motion dynamics of vessels in different areas.

Author Contributions

X.X.: onceptualization, methodology, formal analysis, software, visualization, writing—original draft. C.L.: conceptualization, methodology, formal analysis, visualization, supervision, writing—original draft, writing—review and editing, funding acquisition. J.L.: conceptualization, methodology, visualization, validation, supervision, writing—review and editing, funding acquisition. Y.M.: conceptualization, methodology, visualization, supervision, writing—review and editing. L.Z.: conceptualization, methodology, visualization, supervision, writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was jointly supported by the Zhejiang Provincial Natural Science Foundation of China (Grant No. LZ22F010001) and the Fundamental Research Funds for the Provincial Universities of Zhejiang (Grant No. GK229909299001-013).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw AIS data are available at “https://dma.dk/safety-at-sea/navigational-information/ais-data” (accessed on 6 June 2023). The processed data that support the findings of this study are available on request from the corresponding authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhang, X.; Fu, X.; Xiao, Z.; Xu, H.; Qin, Z. Vessel Trajectory Prediction in Maritime Transportation: Current Approaches and Beyond. IEEE Trans. Intell. Transp. Syst. 2022, 23, 19980–19998. [Google Scholar] [CrossRef]
  2. Tu, E.; Zhang, G.; Mao, S.; Rachmawati, L.; Huang, G. Modeling Historical AIS Data For Vessel Path Prediction: A Comprehensive Treatment. arXiv 2020, arXiv:2001.01592. [Google Scholar]
  3. Rong, H.; Teixeira, A.P.; Soares, C.G. Ship trajectory uncertainty prediction based on a Gaussian Process model. Ocean Eng. 2019, 182, 499–511. [Google Scholar] [CrossRef]
  4. Altché, F.; de La Fortelle, A. An LSTM Network for Highway Trajectory Prediction. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 353–359. [Google Scholar]
  5. Zhang, L.; Zhu, Y.; Su, J.; Lu, W.; Li, J.; Yao, Y. A Hybrid Prediction Model Based on KNN-LSTM for Vessel Trajectory. Mathematics 2022, 10, 4493. [Google Scholar] [CrossRef]
  6. You, L.; Xiao, S.; Peng, Q.; Claramunt, C.; Zhang, J. ST-Seq2Seq: A Spatio-Temporal Feature-Optimized Seq2Seq Model for Short-Term Vessel Trajectory Prediction. IEEE Access 2020, 8, 218565–218574. [Google Scholar] [CrossRef]
  7. Murray, B.; Perera, L.P. An AIS-based multiple trajectory prediction approach for collision avoidance in future vessels. In Proceedings of the International Conference on Offshore Mechanics and Arctic Engineering, Glasgow, UK, 9–14 June 2019; American Society of Mechanical Engineers: New York, NY, USA, 2019; Volume 58851. [Google Scholar]
  8. Kim, E.; Cho, H.; Kim, N.; Kim, E.; Ryu, J.; Park, H. Sensitive Resource and Traffic Density Risk Analysis of Marine Spill Accidents Using Automated Identification System Big Data. J. Mar. Sci. Appl. 2020, 19, 173–181. [Google Scholar] [CrossRef]
  9. Artana, K.; Pitana, T.; Dinariyana, D.; Ariana, M.; Kristianto, D.; Pratiwi, E. Real-time monitoring of subsea gas pipelines, offshore platforms, and ship inspection scores using an Automatic Identification System. J. Mar. Sci. Appl. 2018, 17, 101–111. [Google Scholar] [CrossRef]
  10. DMA. AIS Data. 2022. Available online: https://dma.dk/safety-at-sea/navigational-information/ais-data (accessed on 6 June 2023).
  11. Hacinecipoglu, A.; Konukseven, E.I.; Koku, A.B. Multiple Human Trajectory Prediction and Cooperative Navigation Modeling in Crowded Scenes. Intell. Serv. Robot. 2020, 13, 479–493. [Google Scholar] [CrossRef]
  12. Shang, J.; Liu, C.; Shi, H.; Cheng, T.; Yue, K. A New Algorithm for Navigation Trajectory Prediction of Land Vehicles Based on a Generalised Extended Extrapolation Model. J. Navig. 2019, 72, 1217–1232. [Google Scholar] [CrossRef]
  13. Ni, Y.; Zhao, X. 3DTRIP: A General Framework for 3D Trajectory Recovery Integrated With Prediction. IEEE Robot. Autom. Lett. 2023, 8, 512–519. [Google Scholar] [CrossRef]
  14. Greer, R.; Deo, N.; Trivedi, M. Trajectory Prediction in Autonomous Driving With a Lane Heading Auxiliary Loss. IEEE Robot. Autom. Lett. 2021, 6, 4907–4914. [Google Scholar] [CrossRef]
  15. Kim, K.I.; Lee, K.M. Deep Learning-Based Caution Area Traffic Prediction with Automatic Identification System Sensor Data. Sensors 2018, 18, 3172. [Google Scholar] [CrossRef] [Green Version]
  16. Xu, X.; Bai, X.; Xiao, Y.; He, J.; Xu, Y.; Ren, H. A Port Ship Flow Prediction Model Based on the Automatic Identification System and Gated Recurrent Units. J. Mar. Sci. Appl. 2021, 20, 572–580. [Google Scholar] [CrossRef]
  17. Ogura, T.; Inoue, T.; Uchihira, N. Prediction of Arrival Time of Vessels Considering Future Weather Conditions. Appl. Sci. 2021, 11, 4410. [Google Scholar] [CrossRef]
  18. Alessandrini, A.; Mazzarella, F.; Vespe, M. Estimated Time of Arrival Using Historical Vessel Tracking Data. IEEE Trans. Intell. Transp. Syst. 2019, 20, 7–15. [Google Scholar] [CrossRef]
  19. Petrou, P.; Tampakis, P.; Georgiou, H.V.; Pelekis, N.; Theodoridis, Y. Online Long-Term Trajectory Prediction Based on Mined Route Patterns. In Proceedings of the Multiple-Aspect Analysis of Semantic Trajectories: First International Workshop, MASTER 2019, Wurzburg, Germany, 16 September 2020; pp. 34–49. [Google Scholar]
  20. Perera, L.P.; Soares, C.G. Ocean vessel trajectory estimation and prediction based on extended Kalman filter. In Proceedings of the Second International Conference on Adaptive and Self-Adaptive Systems and Applications, Lisbon, Portugal, 21–26 November 2010; Citeseer: Princeton, NJ, USA, 2010; pp. 14–20. [Google Scholar]
  21. Vasquez, D.; Fraichard, T.; Laugier, C. Growing Hidden Markov Models: An Incremental Tool for Learning and Predicting Human and Vehicle Motion. Int. J. Robot. Res. 2009, 28, 1486–1506. [Google Scholar] [CrossRef] [Green Version]
  22. Bautista-Sánchez, R.; Barbosa-Santillan, L.I.; Sánchez-Escobar, J.J. Method for select best AIS data in prediction vessel movements and route estimation. Appl. Sci. 2021, 11, 2429. [Google Scholar] [CrossRef]
  23. Xu, X.; Liu, C.; Li, J.; Miao, Y. Trajectory clustering for SVR-based Time of Arrival estimation. Ocean Eng. 2022, 259, 111930. [Google Scholar] [CrossRef]
  24. Li, H.; Liu, J.; Liu, R.W.; Xiong, N.; Wu, K.; Kim, T.H. A dimensionality reduction-based multi-step clustering method for robust vessel trajectory analysis. Sensors 2017, 17, 1792. [Google Scholar] [CrossRef] [Green Version]
  25. Yuan, G.; Sun, P.; Zhao, J.; Li, D.; Wang, C. A review of moving object trajectory clustering algorithms. Artif. Intell. Rev. 2017, 47, 123–144. [Google Scholar] [CrossRef]
  26. Li, H.; Liu, J.; Yang, Z.; Liu, R.W.; Wu, K.; Wan, Y. Adaptively constrained dynamic time warping for time series classification and clustering. Inf. Sci. 2020, 534, 97–116. [Google Scholar] [CrossRef]
  27. Zhao, L.; Shi, G. A trajectory clustering method based on Douglas-Peucker compression and density for marine traffic pattern recognition. Ocean Eng. 2019, 172, 456–467. [Google Scholar] [CrossRef]
  28. Wang, L.; Chen, P.; Chen, L.; Mou, J. Ship AIS Trajectory Clustering: An HDBSCAN-Based Approach. J. Mar. Sci. Eng. 2021, 9, 566. [Google Scholar] [CrossRef]
Figure 1. Flow chart of the trajectory prediction algorithm; (1) pre-processed original trajectory is used in key point clustering, (2) filter the candidate set and calculate weighted distance, (3) select the point with the smallest weighted distance as the next location point, (4) calculate the distance from the current location to the destination. If it is less than the threshold value, continue to execute the key point connection step; otherwise, output the predicted trajectory to end the prediction process.
Figure 1. Flow chart of the trajectory prediction algorithm; (1) pre-processed original trajectory is used in key point clustering, (2) filter the candidate set and calculate weighted distance, (3) select the point with the smallest weighted distance as the next location point, (4) calculate the distance from the current location to the destination. If it is less than the threshold value, continue to execute the key point connection step; otherwise, output the predicted trajectory to end the prediction process.
Jmse 11 01211 g001
Figure 2. Key points obtained from oil tankers’ trajectories between 2017 to 2019.
Figure 2. Key points obtained from oil tankers’ trajectories between 2017 to 2019.
Jmse 11 01211 g002
Figure 3. An illustration of the mathematical notation for candidate set filtering.
Figure 3. An illustration of the mathematical notation for candidate set filtering.
Jmse 11 01211 g003
Figure 4. All key points in the training trajectories.
Figure 4. All key points in the training trajectories.
Jmse 11 01211 g004
Figure 5. All waymark points obtained after clustering of the key points.
Figure 5. All waymark points obtained after clustering of the key points.
Jmse 11 01211 g005
Figure 6. The empirical distribution of the COG variance before and after grid clustering, showing that the variance of the COG is greatly reduced after clustering.
Figure 6. The empirical distribution of the COG variance before and after grid clustering, showing that the variance of the COG is greatly reduced after clustering.
Jmse 11 01211 g006
Figure 7. Number of test samples vs. time horizon of trajectory prediction.
Figure 7. Number of test samples vs. time horizon of trajectory prediction.
Jmse 11 01211 g007
Figure 8. Examples of trajectory prediction results.
Figure 8. Examples of trajectory prediction results.
Jmse 11 01211 g008
Figure 9. An illustration of the trajectories of the 8 clusters in the test dataset.
Figure 9. An illustration of the trajectories of the 8 clusters in the test dataset.
Jmse 11 01211 g009
Figure 10. Boxplot of the DTW distance obtained for the proposed algorithm and the two baselines.
Figure 10. Boxplot of the DTW distance obtained for the proposed algorithm and the two baselines.
Jmse 11 01211 g010
Figure 11. Boxplot of the Hausdorff distance obtained for the proposed algorithm and the two baselines.
Figure 11. Boxplot of the Hausdorff distance obtained for the proposed algorithm and the two baselines.
Jmse 11 01211 g011
Table 1. List of key notations.
Table 1. List of key notations.
Key Notation
λ k Longitude
ϕ k Latitude
θ i COG [ 0 ° , 360 ° )
p ˜ k = λ k , ϕ k , θ k The k-th key point
p k = λ k , ϕ k The location of the k-th key point
q m = λ ¯ m , ϕ ¯ m , θ ¯ m , ρ ¯ m The m-th waymark point, λ ¯ m , ϕ ¯ m , θ ¯ m , ρ ¯ m denotes the Lon, Lat, COG and the point density
p D E S Destination location
d i r e p m , p n The heading direction from position p m to position p n
d i s t p m , p n The spherical distance between position p m and p n
a Direction a , a unit vector of the origin of a two-dimensional coordinate system
L D M a 1 , a 2 , . . . , a m The linear directional mean of direction a 1 , a 2 , . . . , a m , defined in Equation (2)
Δ a , b The angular change from direction a to direction b , defined in Equation (3)
Q n Candidate set identified in the n-th iteration, defined in Equation (4)
Q n Refined candidate set at the n-th iteration, defined in Equation (8), Q n Q n
d c u r n The heading trend at the beginning of the n-th iteration, defined in Equation (5)
d Q n The mean COGs of the waymark points in Q n
d n The heading direction from position p n 1 to position p n
d D E S n The direction from position p n to the destination
δ m = Δ d p n , q m , d D E S n Directional difference between d p n , q m and d D E S n , defined in Equation (6)
δ m = Δ θ n , θ ¯ m Directional difference between the COG of p n and the COG of q m , defined in Equation (7)
Table 2. Key point clustering: DBSCAN parameter settings.
Table 2. Key point clustering: DBSCAN parameter settings.
Minimum Number of Points ϵ
COG clusteringkey points n 100 n 401
10 n < 100 5
n < 10 n / 3
Table 3. Performance comparison between different trajectory prediction algorithms based on DTW and Hausdorff distance.
Table 3. Performance comparison between different trajectory prediction algorithms based on DTW and Hausdorff distance.
ClusterSample SizeDTW Distance (Unit: km)Hausdorff Distance (Unit: km)
ProposedBest-MatchedGraph-BasedProposedBest-MatchedGraph-Based
0137555.68676.44772.768.939.9611.68
11734759.10908.501181.1511.6816.0419.06
29031872.652086.402221.0313.4615.8015.16
313201.43218.75421.265.108.189.86
41051475.341552.611668.0111.8010.4212.79
51621075.401050.571204.2713.6114.7015.07
6115522.76548.78943.527.838.8514.81
78167.82408.43359.493.288.0114.45
Average31771094.311244.721462.5911.9815.1417.02
Percentage−12.08%−25.18% −11.05%−42.04%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xu, X.; Liu, C.; Li, J.; Miao, Y.; Zhao, L. Long-Term Trajectory Prediction for Oil Tankers via Grid-Based Clustering. J. Mar. Sci. Eng. 2023, 11, 1211. https://doi.org/10.3390/jmse11061211

AMA Style

Xu X, Liu C, Li J, Miao Y, Zhao L. Long-Term Trajectory Prediction for Oil Tankers via Grid-Based Clustering. Journal of Marine Science and Engineering. 2023; 11(6):1211. https://doi.org/10.3390/jmse11061211

Chicago/Turabian Style

Xu, Xuhang, Chunshan Liu, Jianghui Li, Yongchun Miao, and Lou Zhao. 2023. "Long-Term Trajectory Prediction for Oil Tankers via Grid-Based Clustering" Journal of Marine Science and Engineering 11, no. 6: 1211. https://doi.org/10.3390/jmse11061211

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop