Dynamic Graph Convolutional Network-Based Prediction of the Urban Grid-Level Taxi Demand–Supply Imbalance Using GPS Trajectories

Yang, Haiqiang; Li, Zihan

doi:10.3390/ijgi13020034

Open AccessArticle

Dynamic Graph Convolutional Network-Based Prediction of the Urban Grid-Level Taxi Demand–Supply Imbalance Using GPS Trajectories

by

Haiqiang Yang

^1,2,*

and

Zihan Li

³

¹

Institute for Future, School of Automation, Qingdao University, Qingdao 266071, China

²

Shandong Key Laboratory of Industrial Control Technology, Qingdao 266071, China

³

Institute for Future, College of Physics, Qingdao University, Qingdao 266071, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2024, 13(2), 34; https://doi.org/10.3390/ijgi13020034

Submission received: 17 November 2023 / Revised: 17 January 2024 / Accepted: 20 January 2024 / Published: 24 January 2024

(This article belongs to the Special Issue Unlocking the Power of Geospatial Data: Semantic Information Extraction, Ontology Engineering, and Deep Learning for Knowledge Discovery)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The objective imbalance between the taxi supply and demand exists in various areas of the city. Accurately predicting this imbalance helps taxi companies with dispatching, thereby increasing their profits and meeting the travel needs of residents. The application of Graph Convolutional Networks (GCNs) in traffic forecasting has inspired the development of a spatial–temporal model for grid-level prediction of the taxi demand–supply imbalance. However, spatial–temporal GCN prediction models conventionally capture only static inter-grid correlation features. This research aims to address the dynamic influences caused by taxi mobility and the variations of other transportation modes on the demand–supply dynamics between grids. To achieve this, we employ taxi trajectory data and develop a model that incorporates dynamic GCN and Gated Recurrent Units (GRUs) to predict grid-level imbalances. This model captures the dynamic inter-grid influences between neighboring grids in the spatial dimension. It also identifies trends and periodic changes in the temporal dimension. The validation of this model, using taxi trajectory data from Shenzhen city, indicates superior performance compared to classical time-series models and spatial–temporal GCN models. An ablation study is conducted to analyze the impact of various factors on the predictive accuracy. This study demonstrates the precision and applicability of the proposed model.

Keywords:

taxi demand–supply imbalance; taxi trajectory data; graph convolutional network; gated recurrent unit; dynamic graph convolution

1. Introduction

The optimal state for urban taxi services is to attain a balance between supply and demand [1]. However, due to the mobility of taxis and the randomness of travel demands, many areas within a city are in an imbalanced state most of the time [2]. For instance, some areas have an excessive taxi demand, while in other areas, the taxi supply far exceeds the demand, causing inefficiency. In order to rectify this situation, taxi companies are increasingly using intelligent algorithms for taxi dispatching [3,4]. These algorithms require precise forecasting of the demand–supply imbalance in specific areas. So, grid-level taxi demand–supply prediction [5,6,7], especially the prediction of said imbalance [8,9], has become a focal point of research. The accurate prediction of the demand–supply imbalance in specific areas is a critical consideration when it comes to making taxi dispatching decisions. It also facilitates research in various areas, such as the formulation and forecasting of taxi-based mobility [6], cruising route recommendation for vacant taxis [10], and ride-hailing services [11], among others.

The demand and supply of transportation for urban residents, including taxi services, have traditionally relied on surveys or numerical simulations [12,13]. Nonetheless, these methods present issues with reliability and timeliness. With the widespread use of GPS devices, a significant number of taxis outfitted with GPS systems traverse the city’s road networks, gathering trajectory data that depict aspects of urban traffic operations, such as the traffic flow, speed, and density [14]. The GPS systems of most city taxis can trace information such as the vehicle ID, location coordinates, and time, as well as whether the vehicle is occupied [15,16,17]. This presents opportunities for analyzing demand and supply in terms of taxi travel, offering potential solutions to the accuracy and timeliness issues previously faced.

Grid-level taxi demand and supply demonstrate unique characteristics in the temporal and spatial dimensions, similar to traffic flow parameters like the volume and speed [18]. On the one hand, temporal trend features are present in traffic flow parameters, making them predictable via time-series models, and the same does hold true for grid-level taxi demand and supply. Various modeling approaches exist for traffic forecasting, such as early methods like AR (Autoregressive Models) [19], MA (Moving Average Models) [20], and ARIMA (Auto-Regressive Integrated Moving Average Models) [21]. These models convert time-series data into stationary sequences and then establish a mathematical model to predict future values. However, these models perform poorly when modeling data that exhibit pronounced periodicity, and the intricate periodicity of traffic flows limits the applicability of these models in the domain of traffic prediction. In recent years, numerous machine-learning models have emerged that have shown improved performance in predicting time-series data. RNN (Recurrent Neural Network) [22], LSTM (Long Short-Term Memory) [23], GRU (Gated Recurrent Unit) [24], and Transformer [25] are some such models. These models effectively capture the trend and periodicity characteristics of the traffic flow in the time dimension, resulting in excellent predictive performance.

On the other hand, traffic flow displays detectable spatial correlations in the spatial dimension due to geographical and road connectivity features. This implies that time-series models alone cannot predict the traffic flow with precision. A significant number of deep-learning prediction models have been developed based on the GCN (Graph Convolutional Network) model [26], capable of capturing the spatiotemporal characteristics of traffic flow. For instance, several models have arisen within this context, including T-GCN [27], GAMCN [28], DDP-GCN [29], KST-GCN [30], and TPP-GCN [31]. These models generally take into account only the fixed correlations between vertices while executing spatial convolution operations, such as adjacency relationships [27], distance [30], and attribute correlations between vertices [31]. However, even in adjacent grids, the connections between them can be intricate due to variances in taxi trip origins and destinations. As an illustration, in Figure 1, Grid A and Grid B both belong to residential areas. When the subway operates during its hours of operation, the taxi demand in Grid B surpasses that in Grid A. Nevertheless, when the subway service concludes at night, the taxi demand in both areas tends to equalize. Moreover, Grid C and Grid D exhibit comparable taxi demand on weekdays owing to the presence of schools and hospitals. However, on weekends, when the schools are off, the demand for taxis in Grid C notably decreases in contrast to Grid D.

To tackle these issues, a model for forecasting the taxi demand–supply imbalances at the level of the grid is proposed. A dynamic graph convolutional network is designed to capture the dynamic spatial characteristics between grids. The correlations between the grids can be dynamically computed to form a graph by sliding a window of T time periods along the temporal dimension, as illustrated in Figure 2. The edges of graph

G^{t}

during time period

t

are determined by the attribution correlations between the grids from time period

t - T

to

t

, and so on. Subsequently, we employ a GRU to uncover the temporal characteristics of the taxi supply and demand. We then establish a correlation between the taxi supply and demand, and the mismatch within the grids to predict the current mismatch. We validate our model utilizing taxi trajectory data from Shenzhen city and examine the effect of involved hyperparameters. The key contributions of this paper are summarized below:

A spatial–temporal graph convolution model that integrates GCN and GRU is proposed, which can accurately forecast the taxi demand–supply imbalance at the grid level. It not only reflects the temporal trends and periodicity of the taxi demand–supply in the time dimension but also captures the interplay of the taxi demand–supply imbalance among different grids.
A dynamic graph convolutional network is developed to capture the dynamic spatial features between grids. The correlations between grids are evident and vary substantially in different time periods, such as daytime and nighttime, weekdays and weekends, and so on. By capturing these variable influences, the taxi demand–supply can be predicted more accurately.
Experiments are carried out utilizing real taxi trajectory data from Shenzhen city to validate the feasibility and accuracy of the proposed model. The experiments contrast the suggested model with other existing spatial–temporal graph convolutional models. In addition, the impact of the hyperparameters involved in the model on real-world scenarios are also discussed, providing a reference for the practical application of the model.

The structure of this paper is as follows. Section 2 provides a comprehensive review of the research related to this paper. Section 3 introduces the relevant definitions and explanations, while Section 4 presents the forecasting model of the grid-level taxi demand-supply imbalance, including a breakdown of each component and its overall structure. Section 5 presents the experimental data required for validating the model alongside a comparison of the obtained results with those from other models. Following this, a discussion is provided. Lastly, a comprehensive summary is presented.

2. Related Studies

In this section, we firstly review the related works concerning urban taxi services, and then we review the related work from two perspectives: deep-learning neural network-based methods and complex deformable graph convolutional network-based methods.

2.1. Related Works Concerning Urban Taxi Services

The increasing use of GPS devices in modern cities has led to the emergence of intelligent applications aimed at improving or transforming taxi services [32]. For instance, modeling the location choice of taxi drivers for passenger pickup using GPS data [33], discussing how informal transport can be better understood using GPS tracking data [34], modeling the spatial–temporal of yellow taxi demands in New York City using generalized STAR models [35], etc. With the support of data, this sentence highlights the potential for data-driven methodologies in the field of taxi services. For instance, forecasting residents’ taxi travel demand can be achieved using traditional time-series models such as ARMA and Linear Regression [36], using deep-learning methods such as reinforcement learning to minimize taxi idle times [37], etc. The use of these technologies is leading research in urban taxi services toward data-driven approaches and the ongoing development and application of deep learning.

2.2. Deep-Learning Neural Network-Based Methods

Taxi demand–supply data are a specific type of traffic flow data. Therefore, most methods suitable for traffic flow prediction can also be used for predicting taxi demand–supply data. As traffic flow data are a type of time-series data, researchers have long been using traditional time-series models for prediction, such as HA (Historical Average Model) [38], Kalman Filters [39], ARIMA [15] and its variants [40]. These models convert time-series data into stationary sequences and then build mathematical models to predict future values, which have proven to perform well.

The early time-series models mentioned above have limited suitability for data with strong periodic and trending patterns. Traffic flow data exhibit strong trending and periodic patterns, making these models increasingly less appropriate. With the application of deep learning, models such as RNN [16], LSTM [17], GRU [18] and their variants [41,42] have proven to better capture the changing characteristics of traffic flow data in the time dimension, leading to improved prediction results.

In recent years, an increasing number of studies have focused not only on capturing the changing characteristics of traffic flow data in the time dimension but also on considering the mutual influences between adjacent roads and neighboring areas in the spatial dimension. Some of the earliest attempts have used CNN (Convolutional Neural Network) models [43,44,45] to capture the spatial characteristics of traffic flow. However, urban road networks and regions within the city have unique connectivity and uneven distribution, so the CNN model, which excels at exploring spatial features based on Euclidean distances [46,47], faces limitations in the field of grid-level traffic flow prediction. The GCN model, on the other hand, is better at extracting spatial features from non-Euclidean distance data [27,48]. This has led to a considerable amount of research focusing on combining GCN models with temporal models for traffic prediction. For example, there are models such as the ride-hailing demand forecasting model [49], which combines GCN and LSTM, and the T-GCN model [27], which integrates GCN and GRU. These models have been widely researched and applied due to their outstanding predictive performance.

2.3. Complex Deformable Graph Convolutional Network-Based Methods

However, traditional graph convolutional neural networks, which focus primarily on considering the influence between adjacent nodes (such as road segments, networks, or regions), quickly become inadequate for predicting some complex traffic flows, such as predicting traffic speed in non-grid road networks [29], predicting traffic propagation flow [31], predicting road network traffic density [50], and so on. Consequently, many methods based on complex deformable graph convolutional networks have emerged [43]. These methods can be broadly categorized into static graph-based approaches and dynamic graph-based approaches.

A static graph means that its structure does not change over time. The early graph convolutional neural network-based traffic prediction models considered only the influence between adjacent vertices. Later, various spatial–temporal prediction models with more complex graph structures were developed. These graphs include those that consider the upstream–downstream relationships of road segments [31], angles between adjacent roads [29], attribute correlations between adjacent regions [48], and geographic features within regions [30]. These complex deformable graphs are calculated based on the historical attribute or geographic information associated with the vertices and do not change over time. Furthermore, these graphs are used in conjunction with the original adjacency graph for either sequential or parallel convolutions that enrich the spatial feature extraction. The results have shown that these prediction models outperform models that only perform convolution operations on the adjacency graph.

A dynamic graph means that its structure changes over time. As outlined in the introduction, adjacent grids display dynamic correlations in their taxi demand. It is probable that the correlation is strong during particular time periods on the same day, but weak during others. Additionally, the correlation can fluctuate greatly from one day to the next. Therefore, a number of scholars have initiated investigations into traffic flow prediction models using dynamic graph convolutional networks. Such investigations have resulted in various models, including a reinforced dynamic graph convolutional network model for network-wide traffic flow prediction [51], an attention-based spatial–temporal graph neural network for long-term traffic prediction [52], a new dynamic correlation graph construction method for specific location traffic flow prediction [53], etc. However, these studies primarily use dynamic graph convolution to capture dynamic spatial features at a single-point level or link level [51,52,53,54], with limited focus on grid-level research. Moreover, these studies are geared toward predicting general traffic flow, while the grid-level taxi demand exhibits higher randomness and variability, making it challenging to apply these models.

In summary, there is presently no predictive model that can effectively capture the spatial and dynamic characteristics of the taxi demand–supply imbalances at a grid level, which are caused by the unpredictability and variability of taxis. This study proposes a forecasting model for grid-level taxi demand–supply imbalances that employs dynamic graph convolutional networks to address these challenges.

3. Preliminaries

Table 1 lists all the symbols used in this paper.

3.1. Taxi Demand–Supply Information within a Grid

Taxi trajectories are typically categorized into four types, as shown in Figure 3: (a) type a trajectory

T^{a}

where the vehicle consistently remains in a “vacant” state; (b) type b trajectory

T^{b}

where the vehicle consistently remains in an “occupied” state; (c) type c trajectory

T^{c}

where the vehicle transitions from a “vacant” state to an “occupied” state, and the initial appearance of an “occupied point” signifies passengers boarding the taxi; and (d) type d trajectory

T^{d}

where the vehicle transitions from an “occupied” state to a “vacant” state and the first appearance of a “vacant point” represents passengers disembarking.

The trajectories of these four types of taxis crossing the grid can reflect the traffic demand–supply information within the grid. As shown in Figure 3e, a type a trajectory

T^{a}

crossing the grid can be considered a traffic supply within the grid. A type c trajectory

T^{c}

crossing the grid represents not only a taxi demand but also a taxi supply in the grid, as in Figure 3g. A type d trajectory

T^{d}

crossing the grid can be considered an added traffic supply in the grid, as in Figure 3h.

Therefore, the taxi demand within the grid can be defined as the quantity of type c trajectories crossing the grid. Meanwhile, the taxi supply within the grid can be defined as the sum of the type a trajectories, type c trajectories, and type d trajectories crossing the grid. The grid-level taxi demand–supply imbalance

I_{t}^{g}

within grid

g

in time period

t

can be defined as

I_{t}^{g} = \frac{{\tilde{T^{c}}}_{t}^{g}}{{\tilde{T^{a}}}_{t}^{g} + {\tilde{T^{c}}}_{t}^{g} + {\tilde{T^{d}}}_{t}^{g}}

(1)

in which

{\tilde{T^{a}}}_{t}^{g}

,

{\tilde{T^{c}}}_{t}^{g}

and

{\tilde{T^{d}}}_{t}^{g}

are the sum of the type a trajectories, type c trajectories, and type d trajectories crossing the grid

g

in time period

t

, respectively, and

I_{t}^{g}

ranges from 0 to 1. When the taxi demand-supply imbalance is equal to 0, it implies that the number of type c trajectories is 0, while there are a significant number of type a trajectories and type d trajectories. This means that during this time period, there is minimal taxi demand within the grid, and taxis should be dispatched to other areas.

When the taxi demand–supply imbalance approaches 1, it means that the proportion of type c trajectories is increasing, while the proportion of type a and type d trajectories is decreasing. This indicates that during this time period, the taxi demand within the grid is becoming increasingly challenging to meet, and taxis need to be dispatched to this area.

3.2. Graph of the Taxi Demand–Supply Based on Divided Grids

G 〈V, E〉

can be defined as a general graph, where V represents the nodes in the graph, and E represents the relationships between the nodes. In this paper, we define the taxi demand–supply graph as

G 〈G, R〉

, where

G

represents the set of the grids

\{g 1, g 2, \dots, g n\}

, and

R

is the set of the relationships

\{r 1, r 2, \dots, r m\}

between grids in the graph.

For each grid

g

,we can obtain its feature vector

X_{t ~ t + T}^{g} = [X_{t}^{g}, X_{t + 1}^{g}, \dots, X_{t + T}^{g}]

, where the feature

X_{}^{g}

can be the traffic demand

D^{g}

or the traffic supply

S^{g}

within the grid. And the feature matrix

X_{t ~ t + T}^{G} = \{X^{g 1}, X^{g 2}, \dots, X^{g n}\}

can be obtained for all the grids.

In addition, for each pair of girds

g_{i}

and

g_{j}

(

i, j \leq n

), the relationship between them falls into two categories in most studies [30,31]. The first category involves considering the spatial topology between grids, typically measured using distance. The second category measures the similarity between features within two grids. As shown in Figure 4a, for a 3× grid region, it is possible to construct graphs based on the effective grids A, B, E, F, and G.

The first kind of graph formed is illustrated in Figure 4b, where the distances between A and B, B and E, and E and F are all equal to the length of the grid, while the remaining distances are greater than this value. For grid pairs with smaller distances, wider edges are assigned to represent a higher likelihood of mutual influence between them. The second kind of graph formed is illustrated in Figure 4c, where A, E, and G exhibit high feature correlations, and B and F also display high feature correlations. Grid pairs with greater feature correlations are assigned wider edges to represent their high likelihood of changing simultaneously. We will discuss in detail how to quantify these two types of relationships in Section 4.2 to better predict the grid-level taxi demand–supply imbalance.

3.3. Prediction of the Grid-Level Taxi Demand–Supply Imbalance

In this article, we use predictions of the future traffic demand and supply to predict the taxi demand–supply imbalance within the grid. This mainly consists of three steps:

Step 1 in the spatial dimension, spatial features within the grid

{\bar{X}}_{t - T ’}, {\bar{X}}_{t - T ’ + 1}, \dots, {\bar{X}}_{t}

are computed using graph convolution, as shown in the following equation.

{\bar{X}}_{t - T ’}, {\bar{X}}_{t - T ’ + 1}, \dots, {\bar{X}}_{t} = g c (X_{t - T ’}, X_{t - T ’ + 1}, \dots, X_{t})

(2)

where

g c (\cdot)

represents the graph convolution. The features within the grid discussed in this article mainly refer to the traffic demand and supply, which are represented by the following two equations.

{\bar{D}}_{t - T ’}, {\bar{D}}_{t - T ’ + 1}, \dots, {\bar{D}}_{t} = g c (D_{t - T ’}, D_{t - T ’ + 1}, \dots, D_{t})

(3)

{\bar{S}}_{t - T ’}, {\bar{S}}_{t - T ’ + 1}, \dots, {\bar{S}}_{t} = g c (S_{t - T ’}, S_{t - T ’ + 1}, \dots, S_{t})

(4)

Step 2 in the temporal dimension, we use the GRU model to predict the future traffic demand

{\hat{D}}_{t + 1}, {\hat{D}}_{t + 2}, \dots, {\hat{D}}_{t + T}

and supply

{\hat{S}}_{t + 1}, {\hat{S}}_{t + 2}, \dots, {\hat{S}}_{t + T}

from time period

t + 1

to

t + T

based on the spatial features, as represented by the following two equations.

{\hat{D}}_{t + 1}, {\hat{D}}_{t + 2}, \dots, {\hat{D}}_{t + T} = g r u ({\bar{D}}_{t - T ’}, {\bar{D}}_{t - T ’ + 1}, \dots, {\bar{D}}_{t})

(5)

{\hat{S}}_{t + 1}, {\hat{S}}_{t + 2}, \dots, {\hat{S}}_{t + T} = g r u ({\bar{S}}_{t - T ’}, {\bar{S}}_{t - T ’ + 1}, \dots, {\bar{S}}_{t})

(6)

Step 3 Using the predicted demand and supply, along with the historical demand and supply, the predicted taxi demand–supply imbalance within the grid

{\hat{I}}_{t + 1}^{g}

can be obtained, as indicated by the following equation.

{\hat{I}}_{t + 1} = f ({\hat{D}}_{t + 1}, {\hat{S}}_{t + 1}, (D_{t - T ’}, D_{t - T ’ + 1}, \dots, D_{t}), (S_{t - T ’}, S_{t - T ’ + 1}, \dots, S_{t}))

(7)

and then, the taxi demand–supply imbalance of all the grids in all the time periods can be obtained as follows:

I_{t + 1 ~ t + T}^{G} = \{\{{\hat{I}}_{t + 1}^{g 1}, \dots, {\hat{I}}_{t + T}^{g 1}\}, \{{\hat{I}}_{t + 1}^{g 2}, \dots, {\hat{I}}_{t + T}^{g 2}\}, \dots, \{{\hat{I}}_{t + 1}^{g n}, \dots, {\hat{I}}_{t + T}^{g n}\}\}

(8)

4. Methodology

4.1. Overall Structure

Figure 5 depicts the spatial–temporal prediction network introduced in this paper. The left side of the diagram shows the prediction of the grid-level features using graph convolution and time-series forecasting. This process involves two rounds of graph convolution and one round of GRU prediction. The initial graph convolution utilizes the static grid relationships, which are the distances between grids. The second graph convolution exploits the dynamic grid relationships and considers the correlations of grid features during previous time periods. A GRU model predicts the future attribute values within the grid for the following T time periods. On the right side of Figure 5, a forecast of the grid-level taxi demand–supply imbalance for time period

t + 1

is presented. This part first predicts the taxi demand

{\hat{D}}_{t + 1}

and supply

{\hat{S}}_{t + 1}

, and then combined them with historical data

D_{t - 1}, D_{t}

and

S_{t - 1}, S_{t}

to forecast the demand–supply imbalance

I_{t + 1}

for time period

t + 1

.

4.2. Graph Convolutional Network Part

4.2.1. Geographic Graph

According to Tobler’s First Law of Geography [55], grids that are closer in geographical location exhibit stronger spatial correlations. Additionally, traffic flow naturally exhibits a propensity to propagate more readily between adjacent grids [31]. Therefore, the primary consideration when mining spatial correlations between grids is the distance between them. Hence, we define the geographical adjacency matrix as

A^{G e o}

, and each element

a_{i j}^{G e o}

can be calculated as follows:

a_{i j}^{G e o} = \{\begin{matrix} 1 & d_{e u} (g i, g j) \leq μ \\ 0 & o t h e r w i s e \end{matrix}

(9)

where

d_{e u} (g i, g j)

is the Euclidean distance between grid

g i

and grid

g j

, and

μ

is the distance threshold to pick out the closer grids. In this paper, we only consider the influence between adjacent grids, and thus the value of

μ

is set to

\sqrt{2} \cdot l

, where

l

represents the side length of a grid. As shown in Figure 3, to calculate the spatial features of grid E, only the influences of the eight adjacent grids, A, B, C, D, F, G, H, and I, are considered, while other grids that are farther away are not within the scope of consideration.

4.2.2. Dynamic Feature Correlation Graph

Considering the correlation of vertex features in graph construction has become one of the hot topics in graph convolution research [43]. In this paper, we investigate the travel features within a grid, which exhibit certain distribution patterns over time. For example, residential areas have a high taxi demand during the morning rush hour but a lower demand during the evening rush hour, whereas commercial areas show the opposite pattern. Setting larger connection weights for grids with similar traffic travel patterns enables them to mutually complement information during the convolution process.

Furthermore, traditional research has only considered the static correlations of features within the grids, which is not accurate enough [56,57]. This is because the geographical features within the grids are complex, and travel patterns exhibit strong randomness, such that two grids with high daytime correlation may have different patterns at night. Thus, it is necessary to consider the dynamic correlations and establish a dynamic feature correlation graph. Based on this, the present paper defines the dynamic feature correlation matrix as

A_{i j}^{D y n . t}, A_{i j}^{D y n . t + 1}, \dots, A_{i j}^{D y n . t + T}

. Taking the matrix

A_{i j}^{D y n . t}

for time period

t

as an example, each element is defined as follows:

a_{i j}^{t} = \{\begin{matrix} \frac{ρ (X_{g i}^{t - T ~ t}, X_{g j}^{t - T ~ t}) + 1}{2} & a_{i j}^{G e o} \neq 0 \\ 0 & o t h e r w i s e \end{matrix}

(10)

where

ρ (X_{g i}^{t - T ~ t}, X_{g j}^{t - T ~ t})

represents the Pearson correlation coefficient between the feature vector

[X_{g i}^{t - T}, X_{g i}^{t - T + 1}, \dots, X_{g i}^{t}]

of grid

g i

and the feature vector

[X_{g j}^{t - T}, X_{g j}^{t - T + 1}, \dots, X_{g j}^{t}]

of grid

g j

, which is calculated as follows:

ρ_{i j}^{t - T ~ t} = \frac{\sum_{t^{'} = 1}^{T} (X_{i}^{t^{'}} - {\bar{X}}_{i}^{t - T ~ t}) (X_{j}^{t^{'}} - {\bar{X}}_{j}^{t - T ~ t})}{\sqrt{\sum_{t^{'} = 1}^{T} {(X_{i}^{t^{'}} - {\bar{X}}_{i}^{t - T ~ t})}^{2} \times \sum_{t^{'} = 1}^{T} {(X_{j}^{t^{'}} - {\bar{X}}_{j}^{t - T ~ t})}^{2}}}

(11)

{\bar{X}}_{i}^{t - T ~ t} = \frac{X_{i}^{t - T} + X_{i}^{t - T + 1} + \dots + X_{i}^{t}}{T + 1}

(12)

{\bar{X}}_{j}^{t - T ~ t} = \frac{X_{j}^{t - T} + X_{j}^{t - T + 1} + \dots + X_{j}^{t}}{T + 1}

(13)

It is important to note that the value of

a_{i j}^{t}

falls within the range [0, 1], and the higher its value, the more it signifies that feature predictions between two grids can be mutually influential, leading to a greater impact during the graph convolution process. Additionally,

a_{i j}^{t}

is a dynamic variable, signifying that the feature correlation graph is subject to dynamic changes.

4.2.3. Graph Convolution Layer

For the travel features within the grids, two graph convolution layers are designed to perform spatial feature extraction operations twice. In the first convolution, it is based on the geographic graph, and it filters the influence between adjacent grids. The output of this convolution layer

H^{(G e o)}

is defined as follows:

H^{(G e o)} = τ ({\tilde{D^{G e o}}}^{- 1 / 2} \hat{A^{G e o}} {\tilde{D^{G e o}}}^{- 1 / 2} X θ^{(G e o)})

(14)

where

τ (\cdot)

is the activation function,

\tilde{D^{G e o}}

is the degree matrix of the geographic graph, and

\hat{A^{G e o}}

is the sum of geographical adjacency matrix

A^{G e o}

(defined as Equation (8)) and identity matrix.

X

is the input travel feature matrix, which can be either the traffic demand

D

or traffic supply

S

.

θ^{(G e o)}

is the set of all the parameters of this convolution layer.

The second convolution layer is based on the dynamic feature correlation graph, with its input being the output from the first convolutional layer

H^{(G e o)}

. Its output

H^{(t)}

is defined as follows:

H^{(t)} = τ ({\tilde{D^{t}}}^{- 1 / 2} \hat{A^{t}} {\tilde{D^{t}}}^{- 1 / 2} H^{(G e o)} θ^{(t)})

(15)

where

H^{(t)}

represents the output of the second convolution layer during time period

t

,

\tilde{D^{t}}

is the degree matrix of the feature correlation graph during

t

,

\hat{A^{t}}

is the sum of feature correlation matrix

A^{t}

during t (defined as equation 9) and identity matrix, and

θ^{(t)}

represents all the parameters of the second convolution layer.

4.3. Gated Recurrent Unit Part

After extracting the spatial features of the taxi demand and supply within the grids, it is also necessary to capture the features in the temporal dimension. The Gated Recurrent Unit model is applied to accomplish this task. The GRU model is widely used and has been proven to have significant advantages in terms of predictive accuracy [42], mitigating gradient disappearance and explosion problems [58].

Given the feature sequence

H^{(t - T ’)}, \dots, H^{(t - 1)}, H^{(t)}

after 2 rounds of graph convolution operations, it is possible to recursively predict future the taxi demand and supply using Equations (12)–(15).

r_{t} = σ (W_{r} [H^{(t)}, h_{(t - 1)}] + b_{r})

(16)

z_{t} = σ (W_{u} [H^{(t)}, h_{(t - 1)}] + b_{u})

(17)

{\tilde{h}}_{t} = t a n h (W_{\tilde{h}} [H^{(t)}, (r_{t} ⊙ h_{(t - 1)})] + b_{\tilde{h}})

(18)

h_{t} = z_{t} ⊙ h_{(t - 1)} + (1 - z_{t}) ⊙ {\tilde{h}}_{t}

(19)

where

σ (\cdot)

is the sigmoid activation function,

h_{(t - 1)}

is the outer hidden state during

t - 1

,

{\tilde{h}}_{t}

is the inner hidden state during

t

,

W_{r}

,

W_{u}

and

W_{\tilde{h}}

are the weight sets,

b_{r}

,

b_{u}

and

b_{\tilde{h}}

are the biases,

⊙

represents the dot product between two tensors, and all the the weights and biases are learnable.

4.4. Full Connected Layer

After two rounds of graph convolution operations and one round of prediction using the GRU model, we can obtain the taxi demand

{\hat{D}}_{t + 1}

and supply

{\hat{S}}_{t + 1}

for time period

t + 1

, respectively. This allows us to calculate the taxi demand–supply imbalance for this time period, i.e.,

{\hat{D}}_{t + 1} / {\hat{S}}_{t + 1}

, as described in Section 3.1. However, to mitigate the short-term random fluctuations in urban traffic, we apply smoothing to it using the values from the past two time periods.

In the final part of the model, as shown in Figure 4, we establish a fully connected layer to map the taxi demand–supply imbalance

{\hat{I}}_{t + 1}

to the taxi demand vector

({\hat{D}}_{t + 1}, D_{t}, D_{t - 1})

and taxi supply vector

({\hat{S}}_{t + 1}, S_{t}, S_{t - 1})

, i.e.,

{\hat{I}}_{t + 1} \sim f c (({\hat{D}}_{t + 1}, D_{t}, D_{t - 1}), ({\hat{S}}_{t + 1}, S_{t}, S_{t - 1}))

, to facilitate training and forecasting.

5. Experiment

5.1. Data Description

The dataset used in this article is the real taxi GPS data from Shenzhen, China, including the date, time, taxi ID, longitude, latitude, and passenger status (vacant or occupied). The data were collected in January 2019 and processed to obtain nearly 88.62 million taxi trajectories. Furthermore, we opted for Futian District in Shenzhen city as the experimental area. Futian District lies in the heart of Shenzhen, stretching 10 km in a north–south direction and 12 km in an east–west direction. It spans nearly 80 square kilometers, as depicted in Figure 6.

We perform the following data preprocessing to meet the experimental requirements. (1) Based on the longitude, latitude, and passenger occupancy information contained in the taxi GPS data, we compute four types of trajectories, as illustrated in Figure 3, namely, type a trajectory

T^{a}

, type b trajectory

T^{b}

, type c trajectory

T^{c}

, and type d trajectory

T^{d}

. (2) We grid the entire experimental area according to the 1 km × 1 km standard, and then select 10 min as the minimum statistical time period. Therefore, a total of 102 grids and 4464 periods can be obtained. (3) Based on this, we count the type a trajectory

T^{a}

, type b trajectory

T^{b}

, and type d trajectory

T^{d}

in each grid in each period. Further statistical analysis reveals that the average number of valid taxi trajectories in each grid within a single period is 215, and the number of type a, type c, and type d trajectories is 118, 50, and 47, respectively. (4) According to Formula 1, we compute the dataset of the imbalance index for taxi demand–supply across all the grids and all the time periods. (5) In the end, the data from the first 21 days of January 2019 are utilized as the training set, while the remaining 10 days’ data serve as the testing set.

5.2. Baseline Methods

We select several classic prediction models to compare with our model, including the Historical Average (HA), Auto-Regressive Moving Average (ARIMA), Graph Convolutional Network (GCN), Gated Recurrent Unit model (GRU), Temporal Graph Convolutional Network (T-GCN), Temporal Multi-Spatial Dependence Graph Convolutional Network (TmS-GCN) [48], and Multi-Attribute Graph Convolutional Network (MAGCN) [59].

5.3. Results

To evaluate the predictive performance of our model and the baseline models, we use metrics such as the Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE):

M A E (I, \hat{I}) = \frac{1}{N} \sum_{i}^{N} |I_{i} - {\hat{I}}_{i}|

(20)

M A P E (I, \hat{I}) = \frac{1}{N} \sum_{i}^{N} \frac{|I_{i} - {\hat{I}}_{i}|}{I_{i}}

(21)

R M S E (I, \hat{I}) = \sqrt{\frac{1}{N} \sum_{i}^{N} {(I_{i} - {\hat{I}}_{i})}^{2}}

(22)

where

I

and

\hat{I}

are the actual and predicted values of the taxi supply–demand imbalance, respectively, and

N

is the number of predictions.

Table 2 presents the results of evaluating our model and the baseline models on the dataset described in Section 5.1. We can draw the following conclusions:

The overall predictive performance of deep-learning models is much better than traditional time-series models, which aligns with our expectations. This indicates that the grid-level taxi demand–supply exhibits nonlinear periodicity over time, a feature that traditional time-series models are unable to capture.
Predictive models that comprehensively consider the spatial correlation of the taxi demand–supply and temporal variation features, including T-GCN and our model, outperform time-series models such as HA, ARIMA, and GRU. They also outperform models that only consider spatial correlation features, such as GCN. This indicates that the grid-level taxi demand–supply imbalance is also a typical spatial correlation data, with neighboring grids influencing each other.
Our model, which considers the dynamic correlation features between grids, outperforms T-GCN, TmS-GCN and MAGCN, which only consider static spatial features. This confirms our hypothesis stated in the introduction that the influences on the taxi demand and supply vary over time, even among adjacent grids. The impact is sometimes significant and at other times minimal.

We carry out the comparison between the actual and predicted taxi demand–supply imbalance during specific time periods on January 23rd, including the morning peak hours (7:30–9:30 am), daytime (9:30 am–4:30 pm), and evening peak hours (4:30–7:30 pm). The results are shown in Figure 7. The figure shows that the majority of the grid cells have an imbalance value of 0.6 or less, which means that the demand for taxi rides is less than half of the available supply of taxis. This is due to the use of only taxi GPS data to estimate both the demand and supply. If separate data sources were available to track taxi demand, such as order data from DiDi or Uber, this imbalance information could be more accurately determined. However, this does not affect our evaluation of the model. From this, we can draw the following conclusions:

Based on the imbalance index, it is divided into five categories, namely [0, 0.2], [0.2, 0.4], [0.4, 0.6], [0.6, 0.8], and [0.8, 1.0], where the closer the value is to 1, the more difficult the demand is to meet. Based on this, the number of grid cells with prediction errors during the morning peak, daytime, and evening peak is 7, 7, and 10, respectively, accounting for only 6.86%, 6.86%, and 9.8%.
From the figure, it is noticeable that within certain areas, the imbalance between the taxi demand and supply changes from morning to evening. For example, within the area highlighted by the solid red ellipse in the figure, there is a pronounced imbalance during the morning peak hours, while it tends to stabilize during the midday and evening peak hours.
We further analyze the grid cells with prediction errors, particularly those within the green dashed box during the morning peak hours. In this particular grid, there is a significant discrepancy between the predicted results and the ground truth. The grid corresponds to the area of Meilin Mountain Park, which is characterized by few roads and a low traffic volume, resulting in an average number of taxi trajectories within the grid mostly below 10 during most periods. Grids with a low number of trajectories are bound to affect the predictive accuracy. We will specifically discuss this issue in Section 6.2.

Taxi companies or other city traffic management authorities can use the model’s predicted results to proactively conduct taxi dispatching, which includes short-term scheduling and long-term allocation strategies:

For short-term scheduling, when taxi companies or other city traffic management authorities become aware of an expected demand–supply imbalance in a specific grid, they can proactively dispatch information to available taxis in the surrounding grids, urging them to support that particular grid. This approach aims to address the upcoming imbalance within the grid. Based on our model, we predict the specific travel demand and supply within each grid in addition to the imbalance index. This information can help taxi companies and city traffic management authorities determine the number of taxis needing to be redirected, preventing an excess supply over demand within the grid and avoiding wastage of resources.
For long-term allocation, when a particular area is expected to have an imbalance between the taxi demand and supply in the next half-hour, or even within an hour, it means that residents attempting to hail a taxi within this area during this time may face difficulties and delays. Upon receiving this predictive information, taxi companies or other city traffic management authorities can use platforms such as Uber or Didi to disseminate this information to potential users. This can encourage users to avoid hailing a taxi within this area during the specified time frame. Furthermore, for specific events such as concerts, school dismissal times, or other occasions causing severe imbalances, taxi companies or other city traffic management authorities may designate particular taxis to wait in advance to meet the anticipated high demand for travel in the future.

6. Discussion

This section begins by discussing whether the assumptions presented in the introduction have been validated. Then, we focus on discussing several factors that influence the prediction of taxi demand and supply imbalances, including the number of trajectories within a grid, the size of the grid, and the number of GRU units.

6.1. Verification of the Hypothesis

In the introduction section of this paper, we postulated that the dynamic variations in the taxi demand and supply within a grid are crucial factors influencing the prediction of the grid-level equilibrium indices. Based on this premise, we proposed our model. Based on the experimental results, as shown in Table 2, we can discuss the following aspects:

Overall, solely exploring the static feature correlations between grids, such as the grid adjacency (T-GCN) and static intra-grid feature correlations (TmS-GCN, MAGCN), performs worse in terms of predictive accuracy compared to considering the dynamic intra-grid feature correlations (our model). According to Table 2, particularly at the optimal prediction scale of 10 min, our model shows an improvement of approximately 3.5%, 4.2%, and 1.9% over T-GCN, TmS-GCN, and MAGCN, respectively.
As the prediction scale increases, the performance of the model considering the dynamic intra-grid feature correlations (our model) deteriorates. This pattern is similar to other models, indicating that long-term predictions are more challenging compared to short-term predictions.

However, compared to the T-GCN model, the improvement of our model diminishes. Specifically, at the 10 min scale, the improvement is 3.5%, reducing to 3.2% at the 20 min scale and further down to 0.7% at the 30 min scale. This is because, in our model, predicting the demand and supply separately from the rolling forecast introduces an additional unknown compared to T-GCN, which predicts traffic flow directly from the rolling forecast. The accumulation of errors in predicting more values leads to a decrease in the overall prediction quality. Therefore, the model proposed in this paper exhibits a limitation in its predictive performance on long-term scales.

6.2. Effect of the Number of Trajectories within a Grid

The effect of three types of trajectory counts on the prediction performance is analyzed, as shown in Figure 8. It is obvious that as the number of taxi trajectories within a grid increases, the overall prediction performance improves. When the number of trajectories is less than 10, the predictive performance is the worst, which can explain the worse prediction results for certain grid mentioned in Section 5.3. In addition, compared to type c and type d trajectories, increasing the number of type a trajectories significantly improves the predictive performance. In other words, a higher number of vacant taxis within a grid leads to better predictive performance in terms of the imbalance index. This indirectly highlights the feasibility of using taxi GPS data for prediction, as information about vacant or occupied states is a fundamental aspect of most urban taxi GPS datasets.

Based on the analysis presented above, it is possible to achieve relatively high predictive accuracy in practical applications of this model, as long as the number of taxi trajectories within a grid exceeds 10. To ensure that the prediction results are applicable across most areas of the city, the prediction time scale can be extended or the grid’s range can be expanded. Alternatively, a specific analysis for each scenario can be conducted in practical applications. Prediction results from grids with a higher number of trajectories can be used directly. However, for remote areas with fewer internal taxi trajectories, the prediction results may need to be selectively adopted.

6.3. Effect of the Grid Size

In fact, we can expect that as the grid size increases, the number of taxi trajectories within the grid will also increase, resulting in an improvement in predictive accuracy. However, grid sizes that are too large become meaningless for taxi dispatching by taxi companies, because drivers would not be assigned a specific destination. Therefore, finding an appropriate grid size is crucial to ensuring both effective prediction models and clear destinations for taxi drivers. We conducted an analysis on this issue, and the results are shown in Figure 9. It is obvious that as the grid size increases, metrics such as the MAE, MAPE, and RMSE gradually decrease, indicating better predictive performance. However, when the grid size is around 1 km, the prediction performance is comparatively better, and the improvement becomes less significant. Therefore, we recommend using a grid size of 1 km × 1 km for regional partitioning in model applications.

The grid size is a crucial factor in the practical application of this model. If the grid size is too small, it may provide more precise but unreliable prediction results due to fewer trajectories within the grid. On the other hand, if the grid size is too large, it may offer accurate predictions but may result in overly generalized predictions, making it challenging for taxi companies or other city traffic management authorities to devise precise taxi dispatching strategies.

Therefore, we suggest that for most cities, considering cost-effectiveness, the adoption of a 1 km × 1 km grid division method could be suitable. However, if possible, it may be beneficial to conduct a customized analysis of the density across different regions of the city. In densely trafficked areas, smaller grid sizes, such as 250 m × 250 m or 500 m × 500 m, may be more suitable, while larger grid sizes, such as 1 km × 1 km or 1.5 km × 1.5 km, may be more appropriate for sparsely trafficked areas.

6.4. Effect of the Number of Units of GRU

Based on previous research, we know that the number of hidden units in the GRU can affect the effectiveness of traffic prediction. Therefore, we conducted a detailed analysis of this factor, and the results are shown in Figure 10. We observed that as the number of hidden units increases, the predictive performance gradually improves. However, when the number of hidden units reaches 120, the improvement in predictive performance becomes less significant. Therefore, considering these results, we recommend that using 120 hidden units in the model is the optimal choice.

This is likely to improve the predictive accuracy, although it may impact the computational efficiency and speed. Based on the analysis above, it is recommended to increase the number of hidden units in the GRU to 160 or more, if economic conditions allow and sufficient computational power is available. For scenarios with average economic conditions, it is recommended to set the number of hidden units in the GRU to at least 120. Having fewer than 120 units could significantly reduce the predictive performance.

7. Conclusions

The purpose of this paper is to predict imbalances between the taxi supply and demand at a grid level. To achieve this, a prediction model is proposed that utilizes dynamic Graph Convolutional Neural Networks (GCNs) and Gated Recurrent Units (GRUs). The model is designed to capture the dynamic influence of the taxi demand–supply between adjacent grids, which is affected by the mobility of taxis and changes in other transportation modes. The study utilized the GRU model to capture the temporal trends and cyclical characteristics of the supply and demand for taxis. Experiments were conducted using taxi trajectory data from Futian District in Shenzhen city starting in January 2019 to compare the model against several baseline models, including HA, ARIMA, GCN, GRU, and T-GCN. The data indicate that our model performs better than the baseline models in predicting at different scales (10, 20, and 30 min) based on metrics such as the MAE, MAPE, and RMSE. To improve the practical application of our model, we conducted an ablation study to examine the impact of various factors on the predictive performance. The study findings indicate that better performance in real-world applications is associated with an increased volume of taxi trajectories, a larger grid size of 1 km × 1 km, and more than 120 hidden units in the GRU.

The method proposed in this paper has certain limitations that need to be improved in future research. Firstly, due to the difficulty of data acquisition, we only utilized taxi GPS data from Shenzhen city to validate our method, which itself is a limitation. It would be beneficial to include data from other cities, such as those in North America or Europe, if the opportunity arises, to demonstrate the universality of the method we have proposed. Second, after discussion, it has been observed that as the prediction scale increases in our model, the involvement of more unknowns in the intermediate calculations leads to an increase in errors. This poses a limitation in the application of our model. In the future, we intend to address this issue and make improvements accordingly.

Author Contributions

Haiqiang Yang: Conceptualization, methodology, software, writing—original draft, writing—review and editing. Zihan Li: Software, validation, visualization. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data, models, and code that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yang, H.; Wong, S.C.; Wong, K.I. Demand–Supply Equilibrium of Taxi Services in a Network under Competition and Regulation. Transp. Res. Part B Methodol. 2002, 36, 799–819. [Google Scholar] [CrossRef]
Huang, K.; An, K.; Correia, G.H.D.A.; Rich, J.; Ma, W. An Innovative Approach to Solve the Carsharing Demand-Supply Imbalance Problem under Demand Uncertainty. Transp. Res. Part C Emerg. Technol. 2021, 132, 103369. [Google Scholar] [CrossRef]
Agrawal, S. A Machine-Learning Framework for a Novel 3-Step Approach for Real-Time Taxi Dispatching. In Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, ACT, Australia, 1–4 December 2020; pp. 2705–2712. [Google Scholar]
Alisoltani, N.; Zargayouna, M.; Leclercq, L. A Sequential Clustering Method for the Taxi-Dispatching Problem Considering Traffic Dynamics. IEEE Intell. Transp. Syst. Mag. 2020, 12, 169–181. [Google Scholar] [CrossRef]
Park, J.; Lee, J.; Kim, J.; Chung, J.-H. An Optimization Model of On-Demand Mobility Services with Spatial Heterogeneity in Travel Demand. Transp. Res. Part C Emerg. Technol. 2023, 153, 104229. [Google Scholar] [CrossRef]
Yu, H.; Chen, X.; Li, Z.; Zhang, G.; Liu, P.; Yang, J.; Yang, Y. Taxi-Based Mobility Demand Formulation and Prediction Using Conditional Generative Adversarial Network-Driven Learning Approaches. IEEE Trans. Intell. Transp. Syst. 2019, 20, 3888–3899. [Google Scholar] [CrossRef]
Rodrigues, P.; Martins, A.; Kalakou, S.; Moura, F. Spatiotemporal Variation of Taxi Demand. Transp. Res. Procedia 2020, 47, 664–671. [Google Scholar] [CrossRef]
Tang, J.; Zhu, Y.; Huang, Y.; Peng, Z.-R.; Wang, Z. Identification and Interpretation of Spatial–Temporal Mismatch between Taxi Demand and Supply Using Global Positioning System Data. J. Intell. Transp. Syst. 2019, 23, 403–415. [Google Scholar] [CrossRef]
Liu, M.; Han, J.; Mei, Y.; Li, Y. Dynamic Balance between Demand-and-Supply of Urban Taxis over Trajectories. Math. Biosci. Eng. 2022, 19, 1041–1057. [Google Scholar] [CrossRef]
Hsieh, H.-P.; Lin, F. Recommending Taxi Routes with an Advance Reservation–a Multi-Criteria Route Planner. Int. J. Urban Sci. 2022, 26, 162–183. [Google Scholar] [CrossRef]
Beojone, C.V.; Geroliminis, N. On the Inefficiency of Ride-Sourcing Services towards Urban Congestion. Transp. Res. Part C Emerg. Technol. 2021, 124, 102890. [Google Scholar] [CrossRef]
Xu, Z.; Lv, Z.; Li, J.; Sun, H.; Sheng, Z. A Novel Perspective on Travel Demand Prediction Considering Natural Environmental and Socioeconomic Factors. IEEE Intell. Transp. Syst. Mag. 2023, 15, 136–159. [Google Scholar] [CrossRef]
Button, K. Travel Behavior and Travel Demand. In Handbook of Regional Science; Fischer, M.M., Nijkamp, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2021; pp. 167–185. ISBN 978-3-662-60723-7. [Google Scholar]
Naseer, S.; Liu, W.; Sarkar, N.I.; Shafiq, M.; Choi, J.-G. Smart City Taxi Trajectory Coverage and Capacity Evaluation Model for Vehicular Sensor Networks. Sustainability 2021, 13, 10907. [Google Scholar] [CrossRef]
Nian, G.; Huang, J.; Sun, D. Exploring Built Environment Influence on Taxi Vacant Time in Megacities: A Case Study of Chongqing, China. J. Adv. Transp. 2022, 2022, e3096901. [Google Scholar] [CrossRef]
Liu, W.; Zhang, C.; Zhang, J.; Sharma, P.K.; Alfarraj, O.; Tolba, A.; Wang, Q.; Tang, Y. Rational Layout of Taxi Stop Based on the Analysis of Spatial Trajectory Data. Sustainability 2023, 15, 3227. [Google Scholar] [CrossRef]
Yang, Y.; Yuan, Z.; Fu, X.; Wang, Y.; Sun, D. Optimization Model of Taxi Fleet Size Based on GPS Tracking Data. Sustainability 2019, 11, 731. [Google Scholar] [CrossRef]
Zhang, R.; Ghanem, R. Demand, Supply, and Performance of Street-Hail Taxi. IEEE Trans. Intell. Transp. Syst. 2020, 21, 4123–4132. [Google Scholar] [CrossRef]
Chandra, S.R.; Al-Deek, H. Predictions of Freeway Traffic Speeds and Volumes Using Vector Autoregressive Models. J. Intell. Transp. Syst. 2009, 13, 53–72. [Google Scholar] [CrossRef]
Lee, S.; Fambro, D.B. Application of Subset Autoregressive Integrated Moving Average Model for Short-Term Freeway Traffic Volume Forecasting. Transp. Res. Rec. 1999, 1678, 179–188. [Google Scholar] [CrossRef]
Kumar, S.V.; Vanajakshi, L. Short-Term Traffic Flow Prediction Using Seasonal ARIMA Model with Limited Input Data. Eur. Transp. Res. Rev. 2015, 7, 1–9. [Google Scholar] [CrossRef]
Osipov, V.; Nikiforov, V.; Zhukova, N.; Miloserdov, D. Urban Traffic Flows Forecasting by Recurrent Neural Networks with Spiral Structures of Layers. Neural Comput. Appl. 2020, 32, 14885–14897. [Google Scholar] [CrossRef]
Bogaerts, T.; Masegosa, A.D.; Angarita-Zapata, J.S.; Onieva, E.; Hellinckx, P. A Graph CNN-LSTM Neural Network for Short and Long-Term Traffic Forecasting Based on Trajectory Data. Transp. Res. Part C Emerg. Technol. 2020, 112, 62–77. [Google Scholar] [CrossRef]
Vázquez, J.J.; Arjona, J.; Linares, M.; Casanovas-Garcia, J. A Comparison of Deep Learning Methods for Urban Traffic Forecasting Using Floating Car Data. Transp. Res. Procedia 2020, 47, 195–202. [Google Scholar] [CrossRef]
Tedjopurnomo, D.A.; Choudhury, F.M.; Qin, A.K. TrafFormer: A Transformer Model for Predicting Long-Term Traffic. Available online: https://arxiv.org/abs/2302.12388v3 (accessed on 7 January 2024).
Yao, L.; Mao, C.; Luo, Y. Graph Convolutional Networks for Text Classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 7370–7377. [Google Scholar]
Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction. IEEE Trans. Intell. Transport. Syst. 2020, 21, 3848–3858. [Google Scholar] [CrossRef]
Qi, J.; Zhao, Z.; Tanin, E.; Cui, T.; Nassir, N.; Sarvi, M. A Graph and Attentive Multi-Path Convolutional Network for Traffic Prediction. IEEE Trans. Knowl. Data Eng. 2023, 35, 6548–6560. [Google Scholar] [CrossRef]
Lee, K.; Rhee, W. DDP-GCN: Multi-Graph Convolutional Network for Spatiotemporal Traffic Forecasting. Transp. Res. Part C Emerg. Technol. 2022, 134, 103466. [Google Scholar] [CrossRef]
Zhu, J.; Han, X.; Deng, H.; Tao, C.; Zhao, L.; Wang, P.; Lin, T.; Li, H. KST-GCN: A Knowledge-Driven Spatial-Temporal Graph Convolutional Network for Traffic Forecasting. IEEE Trans. Intell. Transp. Syst. 2022, 23, 15055–15065. [Google Scholar] [CrossRef]
Yang, H.; Li, Z.; Qi, Y. Predicting Traffic Propagation Flow in Urban Road Network with Multi-Graph Convolutional Network. Complex Intell. Syst. 2023, 1–13. [Google Scholar] [CrossRef]
Molloy, J.; Castro, A.; Götschi, T.; Schoeman, B.; Tchervenkov, C.; Tomic, U.; Hintermann, B.; Axhausen, K.W. The MOBIS Dataset: A Large GPS Dataset of Mobility Behaviour in Switzerland. Transportation 2023, 50, 1983–2007. [Google Scholar] [CrossRef]
Demissie, M.G.; Kattan, L.; Phithakkitnukoon, S.; Homem de Almeida Correia, G.; Veloso, M.; Bento, C. Modeling Location Choice of Taxi Drivers for Passenger Pickup Using GPS Data. IEEE Intell. Transp. Syst. Mag. 2021, 13, 70–90. [Google Scholar] [CrossRef]
Goletz, M.; Ehebrecht, D. How Can GPS/GNSS Tracking Data Be Used to Improve Our Understanding of Informal Transport? A Discussion Based on a Feasibility Study from Dar Es Salaam. J. Transp. Geogr. 2020, 88, 102305. [Google Scholar] [CrossRef]
Safikhani, A.; Kamga, C.; Mudigonda, S.; Faghih, S.S.; Moghimi, B. Spatio-Temporal Modeling of Yellow Taxi Demands in New York City Using Generalized STAR Models. Int. J. Forecast. 2020, 36, 1138–1148. [Google Scholar] [CrossRef]
Faghih, S.; Shah, A.; Wang, Z.; Safikhani, A.; Kamga, C. Taxi and Mobility: Modeling Taxi Demand Using ARMA and Linear Regression. Procedia Comput. Sci. 2020, 177, 186–195. [Google Scholar] [CrossRef]
O’Keeffe, K.; Anklesaria, S.; Santi, P.; Ratti, C. Using Reinforcement Learning to Minimize Taxi Idle Times. J. Intell. Transp. Syst. 2022, 26, 498–509. [Google Scholar] [CrossRef]
Hobeika, A.G.; Kim, C.K. Traffic-Flow-Prediction Systems Based on Upstream Traffic. In Proceedings of the VNIS’94-1994 Vehicle Navigation and Information Systems Conference, Yokohama, Japan, 31 August–2 September 1994; pp. 345–350. [Google Scholar]
Okutani, I.; Stephanedes, Y.J. Dynamic Prediction of Traffic Volume through Kalman Filtering Theory. Transp. Res. Part B Methodol. 1984, 18, 1–11. [Google Scholar] [CrossRef]
Zhao, L.; Wen, X.; Wang, Y.; Shao, Y. A Novel Hybrid Model of ARIMA-MCC and CKDE-GARCH for Urban Short-Term Traffic Flow Prediction. IET Intell. Transp. Syst. 2022, 16, 206–217. [Google Scholar] [CrossRef]
Kavehmadavani, F.; Nguyen, V.-D.; Vu, T.X.; Chatzinotas, S. Intelligent Traffic Steering in Beyond 5G Open RAN Based on LSTM Traffic Prediction. IEEE Trans. Wirel. Commun. 2023, 22, 7727–7742. [Google Scholar] [CrossRef]
Qi, Q.; Cheng, R.; Ge, H. Short-Term Travel Demand Prediction of Online Ride-Hailing Based on Multi-Factor GRU Model. Sustainability 2022, 14, 4083. [Google Scholar] [CrossRef]
Medina-Salgado, B.; Sanchez-DelaCruz, E.; Pozos-Parra, P.; Sierra, J.E. Urban Traffic Flow Prediction Techniques: A Review. Sustain. Comput. Inform. Syst. 2022, 35, 100739. [Google Scholar] [CrossRef]
Alahmari, F.; Naim, A.; Alqahtani, H. E-Learning Modeling Technique and Convolution Neural Networks in Online Education. In IoT-Enabled Convolutional Neural Networks: Techniques and Applications; River Publishers: New York, NY, USA, 2023; ISBN 978-1-00-339303-0. [Google Scholar]
Krichen, M. Convolutional Neural Networks: A Survey. Computers 2023, 12, 151. [Google Scholar] [CrossRef]
Xu, Y.; Su, H.; Ma, G.; Liu, X. A Novel Dual-Modal Emotion Recognition Algorithm with Fusing Hybrid Features of Audio Signal and Speech Context. Complex Intell. Syst. 2023, 9, 951–963. [Google Scholar] [CrossRef]
Kattenborn, T.; Leitloff, J.; Schiefer, F.; Hinz, S. Review on Convolutional Neural Networks (CNN) in Vegetation Remote Sensing. ISPRS J. Photogramm. Remote Sens. 2021, 173, 24–49. [Google Scholar] [CrossRef]
Yang, H.; Zhang, X.; Li, Z.; Cui, J. Region-Level Traffic Prediction Based on Temporal Multi-Spatial Dependence Graph Convolutional Network from GPS Data. Remote Sens. 2022, 14, 303. [Google Scholar] [CrossRef]
Marquet, O. Spatial Distribution of Ride-Hailing Trip Demand and Its Association with Walkability and Neighborhood Characteristics. Cities 2020, 106, 102926. [Google Scholar] [CrossRef]
Florin, R.; Olariu, S. Real-Time Traffic Density Estimation: Putting on-Coming Traffic to Work. IEEE Trans. Intell. Transp. Syst. 2023, 24, 1374–1383. [Google Scholar] [CrossRef]
Chen, Y.; Chen, X. A Novel Reinforced Dynamic Graph Convolutional Network Model with Data Imputation for Network-Wide Traffic Flow Prediction. Transp. Res. Part C Emerg. Technol. 2022, 143, 103820. [Google Scholar] [CrossRef]
Zhang, W.; Zhu, K.; Zhang, S.; Chen, Q.; Xu, J. Dynamic Graph Convolutional Networks Based on Spatiotemporal Data Embedding for Traffic Flow Forecasting. Knowl.-Based Syst. 2022, 250, 109028. [Google Scholar] [CrossRef]
Li, H.; Yang, S.; Song, Y.; Luo, Y.; Li, J.; Zhou, T. Spatial Dynamic Graph Convolutional Network for Traffic Flow Forecasting. Appl. Intell. 2023, 53, 14986–14998. [Google Scholar] [CrossRef]
Guo, S.; Lin, Y.; Wan, H.; Li, X.; Cong, G. Learning Dynamics and Heterogeneity of Spatial-Temporal Graph Data for Traffic Forecasting. IEEE Trans. Knowl. Data Eng. 2022, 34, 5415–5428. [Google Scholar] [CrossRef]
Tobler, W.R. A Computer Movie Simulating Urban Growth in the Detroit Region. Econ. Geogr. 1970, 46, 234–240. [Google Scholar] [CrossRef]
Kalander, M.; Zhou, M.; Zhang, C.; Yi, H.; Pan, L. Spatio-Temporal Hybrid Graph Convolutional Network for Traffic Forecasting in Telecommunication Networks. arXiv 2020, arXiv:2009.09849. [Google Scholar]
Vinchoff, C.; Chung, N.; Gordon, T.; Lyford, L.; Aibin, M. Traffic Prediction in Optical Networks Using Graph Convolutional Generative Adversarial Networks. In Proceedings of the 2020 22nd International Conference on Transparent Optical Networks (ICTON), Bari, Italy, 19–23 July 2020; pp. 1–4. [Google Scholar]
Wang, J.; Zhao, L.; Du, J.; Jieensi, A. Online Ride-Hailing Demand Prediction Model Based on GRU & LSTM. J. Phys. Conf. Ser. 2023, 2589, 012019. [Google Scholar]
Wang, Y.; Zhao, A.; Li, J.; Lv, Z.; Dong, C.; Li, H. Multi-Attribute Graph Convolution Network for Regional Traffic Flow Prediction. Neural Process. Lett. 2023, 55, 4183–4209. [Google Scholar] [CrossRef]

Figure 1. Possible reasons for changes in the taxi demand between adjacent grids: (a) a residential area with a metro station; (b) a normal residential area; (c) a residential area with a school; and (d) a residential area with a hospital.

Figure 2. Dynamic graphs in the temporal dimension.

Figure 3. Four types of taxi trajectories crossing a grid: (a) type a taxi trajectory; (b) type b taxi trajectories; (c) type c taxi trajectories; (d) type d taxi trajectories; (e) type a taxi trajectory crossing a grid; (f) type b taxi trajectory crossing a grid; (g) type c taxi trajectory crossing a grid; and (h) type d taxi trajectory crossing a grid.

Figure 4. Two categories of graphs for a 3× grid region: (a) A 3 × 3 grid relation with 2 different kinds of traffic features; (b) the graph constructed by adjusting the relationships; and (c) the graph constructed using the traffic features within the grids.

Figure 5. The structure of the spatial–temporal dynamic graph convolutional network for the grid-level taxi demand–supply imbalance prediction model.

Figure 6. Experimental area: (a) the Futian District area in Shenzhen; and (b) The Futian District area divided into square grids of 1 km × 1 km.

Figure 7. Ground truths and predictions of the taxi demand–supply imbalance during the morning peak hours (7:30~9:30), daytime (9:30~16:30) and evening peak hours (16:30~19:30): (a) ground truths during 7:30~9:30; (b) predictions during 7:30~9:30; (c) grounds truth during 9:30~16:30; (d) predictions during 9:30~16:30; (e) ground truths during 16:30~19:30; and (f) predictions during 16:30~19:30.

Figure 8. The impact of the number of trajectories within a grid.

Figure 9. The impact of the grid size.

Figure 10. The impact of the number of units in the GRU.

Table 1. Notations.

Notation	Description
$T^{a}$	A type a trajectory where the vehicle consistently remains in a “vacant” state.
$T^{b}$	A type b trajectory where the vehicle consistently remains in an “occupied” state.
$T^{c}$	A type c trajectory where the vehicle transitions from a “vacant” state to an “occupied” state.
$T^{d}$	A type d trajectory where the vehicle transitions from an “occupied” state to a “vacant” state.
$I_{t}^{g}$	The grid-level taxi demand–supply imbalance $I_{t}^{g}$ within grid $g$ in time period $t$ .
${\tilde{T^{a}}}_{t}^{g}$	The sum of the type a trajectories crossing the grid $g$ in time period $t$ .
${\tilde{T^{c}}}_{t}^{g}$	The sum of the type c trajectories crossing the grid $g$ in time period $t$ .
${\tilde{T^{d}}}_{t}^{g}$	The sum of the type d trajectories crossing the grid $g$ in time period $t$ .
$G 〈V, E〉$	A general graph, where $V$ represents the nodes in the graph, and $E$ represents the relationships between the nodes.
$G 〈G, R〉$	The taxi demand–supply graph.
$G$	The set of the grids $\{g 1, g 2, \dots, g n\}$ .
$R$	The set of the relationships $\{r 1, r 2, \dots, r m\}$ between grids in the graph.
$X_{t ~ t + T}^{g}$	The feature vector of grid $g$ .
$X_{}^{g}$	The traffic demand $D^{g}$ or the traffic supply $S^{g}$ of grid $g$ .
${\bar{X}}_{t}$	The spatial features within the grid $g$ in time period $t$ .
${\bar{D}}_{t}$	The spatial features of the travel demand within the grid $g$ in time period $t$ .
${\bar{S}}_{t}$	The spatial features of the travel supply within the grid $g$ in time period $t$ .
$g c (\cdot)$	The graph convolution.
$g r u (\cdot)$	The processing using the GRU model.
${\hat{D}}_{t}$	The temporal features of the travel demand within the grid $g$ in time period $t$ .
${\hat{S}}_{t}$	The temporal features of the travel supply within the grid $g$ in time period $t$ .
${\hat{I}}_{t}^{G}$	The predicted taxi demand–supply imbalance within the grid $g$ in time period $t$ .
$I_{t + 1 ~ t + T}^{G}$	The taxi demand–supply imbalance of all the grids in all the time periods.

Table 2. Comparison of the performance of the different models across temporal scales.

Models	Metrics
	10 min			20 min			30 min
	MAE	MAPE	RMSE	MAE	MAPE	RMSE	MAE	MAPE	RMSE
HA	0.0476	1.5375	0.0769	0.0476	1.5375	0.0769	0.0476	1.5375	0.0769
ARIMA	0.0902	0.5552	0.1108	0.0928	0.7627	0.1066	0.0980	1.3971	0.1225
GCN	0.0462	0.3286	0.0728	0.0473	0.3373	0.0736	0.0483	0.3463	0.0745
GRU	0.0439	0.2902	0.0678	0.0443	0.3072	0.0743	0.0446	0.3159	0.0853
T-GCN	0.0428	0.2826	0.0669	0.0439	0.2792	0.0684	0.0441	0.2867	0.0684
TmS-GCN	0.0431	0.2845	0.0680	0.0433	0.2880	0.0695	0.0441	0.2865	0.0692
MAGCN	0.0421	0.2812	0.0653	0.0453	0.2881	0.0702	0.0462	0.2907	0.0713
our model	0.0413	0.2716	0.0643	0.0425	0.2855	0.0659	0.0438	0.2932	0.0672

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, H.; Li, Z. Dynamic Graph Convolutional Network-Based Prediction of the Urban Grid-Level Taxi Demand–Supply Imbalance Using GPS Trajectories. ISPRS Int. J. Geo-Inf. 2024, 13, 34. https://doi.org/10.3390/ijgi13020034

AMA Style

Yang H, Li Z. Dynamic Graph Convolutional Network-Based Prediction of the Urban Grid-Level Taxi Demand–Supply Imbalance Using GPS Trajectories. ISPRS International Journal of Geo-Information. 2024; 13(2):34. https://doi.org/10.3390/ijgi13020034

Chicago/Turabian Style

Yang, Haiqiang, and Zihan Li. 2024. "Dynamic Graph Convolutional Network-Based Prediction of the Urban Grid-Level Taxi Demand–Supply Imbalance Using GPS Trajectories" ISPRS International Journal of Geo-Information 13, no. 2: 34. https://doi.org/10.3390/ijgi13020034

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dynamic Graph Convolutional Network-Based Prediction of the Urban Grid-Level Taxi Demand–Supply Imbalance Using GPS Trajectories

Abstract

1. Introduction

2. Related Studies

2.1. Related Works Concerning Urban Taxi Services

2.2. Deep-Learning Neural Network-Based Methods

2.3. Complex Deformable Graph Convolutional Network-Based Methods

3. Preliminaries

3.1. Taxi Demand–Supply Information within a Grid

3.2. Graph of the Taxi Demand–Supply Based on Divided Grids

3.3. Prediction of the Grid-Level Taxi Demand–Supply Imbalance

4. Methodology

4.1. Overall Structure

4.2. Graph Convolutional Network Part

4.2.1. Geographic Graph

4.2.2. Dynamic Feature Correlation Graph

4.2.3. Graph Convolution Layer

4.3. Gated Recurrent Unit Part

4.4. Full Connected Layer

5. Experiment

5.1. Data Description

5.2. Baseline Methods

5.3. Results

6. Discussion

6.1. Verification of the Hypothesis

6.2. Effect of the Number of Trajectories within a Grid

6.3. Effect of the Grid Size

6.4. Effect of the Number of Units of GRU

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI