A GIS-Based Bivariate Logistic Regression Model for the Site-Suitability Analysis of Parcel-Pickup Lockers: A Case Study of Guangzhou, China

Zheng, Zilai; Morimoto, Takehiro; Murayama, Yuji

doi:10.3390/ijgi10100648

Open AccessArticle

A GIS-Based Bivariate Logistic Regression Model for the Site-Suitability Analysis of Parcel-Pickup Lockers: A Case Study of Guangzhou, China

by

Zilai Zheng

^*

,

Takehiro Morimoto

and

Yuji Murayama

Faculty of Life and Environmental Sciences, Graduate School of Life and Environmental Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba 305-8572, Ibaraki, Japan

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2021, 10(10), 648; https://doi.org/10.3390/ijgi10100648

Submission received: 30 June 2021 / Revised: 12 September 2021 / Accepted: 21 September 2021 / Published: 26 September 2021

(This article belongs to the Special Issue Geo-Information Technology and Its Applications)

Download

Browse Figures

Versions Notes

Abstract

:

The site-suitability analysis (SSA) of parcel-pickup lockers (PPLs) is becoming a critical problem in last-mile logistics. Most studies have focused on the site-selection problem to identify the best site from given potential sites in specific areas, while few have solved the site-search problem to determine the boundary of the suitable area. A GIS-based bivariate logistic regression (LR) model using the supervised machine-learning (ML) algorithm was developed for suitability classification in this study. Eight crucial factors were selected from 27 candidate variables using stepwise methods with a training dataset in the best LR model. The variable of the proximity to residential buildings was more important than that to various commercial buildings, transport services, and roads. Among the four types of residential buildings, the most crucial factor was the proximity to residential quarters. A test dataset was employed for the validation process, showing that the best LR model had excellent performance. The results identified the suitable areas for PPLs, accounting for 8% of the total area of Guangzhou (GZ). A decision-maker can focus on these suitable areas as the site-selection ranges for PPLs, which significantly reduces the difficulty of analysis and time costs. This method can quickly decompose a large-scale area into several small-scale suitable areas, with relevance to the problem of selecting sites from various candidate sites.

Keywords:

parcel-pickup lockers; site-suitability analysis; GIS-based; bivariate logistic regression model; suitability classification

1. Introduction

The rapid development of e-commerce has severely impacted parcel distribution, and the last-mile delivery problem restricts logistics development. Many e-commerce companies, logistics service providers, and other stakeholders considered effective systems for last-mile delivery to be essential competitive advantages and attempted to tackle the bottleneck by innovative methods, such as parcel-pickup points (PPPs, also called collection and delivery points), drone delivery, and autonomous ground vehicle delivery [1,2,3,4]. PPP is the most widely used novel solution that helps firms reduce costs through consolidated shipments and provide customers with a flexible, convenient, and comfortable means of receiving parcels.

PPPs have garnered significant interest in logistics research. Studies address the advantages of PPPs such as economic efficiency, environmental friendliness, and high service quality [5,6,7,8]. There are two types of PPPs: parcel-pickup shops (PPSs) and parcel-pickup lockers (PPLs). PPLs rely on intelligent technology without human interaction, whereas PPSs cooperate with commercial facilities. PPLs exhibit the advantages of long opening hours, flexible collection times, and anonymity. Consumers are allowed to collect their parcels without being bound to shop opening hours. In addition, parcels can be retrieved anonymously because no human interaction is required [9,10]. Given that PPSs cooperate with existing facilities and PPLs are built-in uncertain locations, the location planning problem for PPLs is more complex and challenging to solve. In the 13th Five-Year Plan for the logistics development of China, the installation of PPLs was accelerated, and the percentage of PPL delivery was projected to reach 10% by the end of 2020 [11]. However, there are no clear guidelines on the location planning for PPLs from the government. Furthermore, the coronavirus (COVID-19) pandemic impacts people’s lifestyles, and social distancing limits face-to-face contact with others, resulting in more online shopping and larger parcel volumes. PPLs play a specific role in the prevention and control of the COVID-19 pandemic. Therefore, the need for PPLs is most urgent.

Site-suitability analysis (SSA) is conducted to identify the most appropriate spatial locations or patterns for planning according to specific requirements, preferences, or predictors of a certain activity [12,13,14]. There are two types of SSA: site-selection analysis and site-search analysis. Site-selection analysis determines the best site from a given set of potential sites, while site-search analysis identifies the area or location of the best site [15]. SSA is becoming a particularly critical topic for PPL planning. Most studies of PPL have focused on the site-selection problem in SSA to identify the best sites by ranking or rating candidates based on different indicators [16,17]. However, they present the limitation of requiring specific areas with sets of predetermined candidates. For the planning of large-scale areas, decision-makers rarely have specific lists of predetermined candidates. First, they need to search for suitable areas and then further identify specific candidates within these suitable areas to select the most appropriate points. Determining how to search for suitable areas quickly is critical. It can help decision-makers to be significantly more efficient at the beginning of planning. It can also serve to quickly break down a large-scale area of study into several small-scale areas of study, with relevance to the site-selection topics of most current studies on PPL planning.

Thus, the main aim of this study was to develop a GIS-based bivariate logistic regression (LR) model with supervised classification algorithms to search for areas suitable for PPLs in a large-scale area. The selection criteria were chosen from many potential variables and their weights were determined using a data-driven Machine Learning (ML) algorithm. A decision-maker can focus on these suitable areas as site-selection ranges for PPLs, which significantly reduces the difficulty of analysis and time costs.

2. Literature Review

The three core issues related to the location analysis for PPPs in previous studies are (1) influencing factors, (2) spatial distribution patterns, and (3) site selection. For the influencing factors, some studies state that the distribution of PPPs is strongly related to the population density, land-use types, urban development, and spatial accessibility according to their agglomeration pattern [18,19,20,21]. Some studies have found that residents’ behavior also has a relationship with PPP layout, and thus developed methods for measuring customers’ spatial access to PPPs, considering differentiated supply and demand [22]. For spatial distribution patterns, the patterns of PPSs in several cities of China (Changsha, Wuhan, and Xi’an) were investigated using point of interest (POI) data [21,23,24]. The results showed that there are more PPSs in the central regions and fewer in the periphery regions, and there are multi-core agglomerations in general. For the site selection, research determined the best sites by ranking or rating candidates based on different indicators [16,17]. In general, previous studies related to location analysis for PPPs only analyzed the location characteristics and impact factors. Few studies addressed the site-search problem in SSA to identify the boundaries of the suitable sites in a large-scale area, such as a metropolis.

GIS-based SSA techniques are widely applied in urban, regional, and environmental planning activities, such as labeling potential hazards, ecological resources, habitats, and geological favorability, or locating advantageous sites for facilities, agricultural activities, and urban development [15,25,26,27,28,29]. The challenging aspect of GIS-based SSA is determining the important factors and their weights. Three major groups of approaches to GIS-based SSA are computer-assisted overlay mapping, multi-criteria evaluation (MCE), and ML algorithms [15]. However, a criticism of the computer-assisted overlap map approach is that it is often used without verifying independent assumptions regarding the suitability criteria, nor is it standardized using appropriate methods [30]. In the MCE approach, the weights of the suitability criteria are determined subjectively, which is imprecise and ambiguous. Different multi-criteria evaluation rules generate remarkably different suitability patterns [15,31]. As a new data-driven technique, ML could overcome the limitations of the aforementioned approaches and better address problems involving enormous datasets. There are two types of models for the ML algorithm: white-box models are the explainable-type modes that allow an interpretation of the model parameters; black-box models, such as support vector machines or artificial neural networks, do not allow such an interpretation and can only be verified externally [32]. The LR model is the most common and useful white-box model for supervised classification algorithms due to its easy and efficient operation. The data types of the variables can be continuous or categorical. The result of the LR model is measured as a probability from 0 to 1, which can be considered as the suitability index. Thus, the large-scale area in this study was subdivided into a micro-scale raster to form the basic units of observation, and the classification of each raster was conducted according to its suitability index.

3. Materials and Methods

3.1. Study Area and Data

China has the largest e-commerce market globally, with over 40% of global e-commerce transactions originating from the country as of 2017. Guangzhou (GZ) is one of the four most developed metropolises in China, where PPLs occupy the market in the early stage. Furthermore, GZ has been ranked first for parcel receipts in China for seven consecutive years, from 2014 to 2020 [33]. As shown in Figure 1, GZ (112°57′ E−114°30′ E; 22°26′ N–23°56′ N), located in the center of Guangdong Province in south China, had a population of 15.3 million in 2019 and covered an area of 7434 km². GZ is the third-largest metropolis in China, containing 11 administrative districts.

In this study, the suitability modeling for PPLs was conducted using five types of data: POI data, road-network data, population data with a resolution of 100 m, land-price data, and a digital elevation model (DEM) with a resolution of 30 m, as shown in Table 1. Given the large quantity and wide distribution of PPLs and the related facilities, manual data acquisition was time-consuming and inaccurate, hindering the progress of PPL research. POI data—a novel form of data incorporating information such as latitudinal and longitudinal coordinates, specific locations, place names, and other attribute information—played an essential role in the analysis of macro-scale spatial distribution characteristics. POI data had the advantages of comprehensive coverage, high recognition accuracy, and high accessibility. Thus, POI big data improved the quality of micro-scale studies on PPL locations. In this study, POI data were obtained from Gaode Map, which was an everyday navigation application popular in China. It used three-level classification codes to classify objects of POI data. From the open application programming interface (API) of Gaode Map, developers could extract data for a specific area, a specific category, or a keyword for the name. According to the literature, PPL distributions were strongly related to traffic convenience and residential and commercial areas. The influential factors from the POI data were chosen from two major categories with several subcategories: transportation service and commercial/house, as shown in Table 2. The locations of PPLs were searched for using the keywords ‘parcel locker’ or ‘self-pickup locker’. A total of 679 PPLs were extracted from Gaode Map in 2019. The road network data were collected from OpenStreetMap (OSM).

3.2. Methodology

Figure 2 shows the methodology used in this research. It mainly consisted of five parts: (1) the conversion of multi-source data to the same scale, (2) the preparation of the observation data, (3) the diagnosis of the assumptions of the LR model, (4) the determination of the best combination of explanatory variables, (5) the evaluation of the model’s performance, and (6) the generation of the suitability map using the best model.

Variables X1 to X27 are explained in Table 3, and their distribution maps are shown in Figure 3.

3.2.1. Conversion of the Multi-Source Data to the Same Scale

The challenges associated with multi-source data were attributable to the different types and scales of the data. The multi-source data should be unified to the same type and unit in the preprocessing stage. This study used four different data types—vector-line, vector-point, vector-polygon, and raster data—with different resolutions. As this study aimed to identify suitable areas at the pixel level, all the data needed to be converted to the same data type (raster) with the same resolution. The vector-line and point data were converted using the Euclidean distance and kernel density method. The vector-polygon data were directly converted to raster data. Higher-resolution raster data were converted to a lower resolution using the resampling tool of the ArcGIS 10.6 software. A total of 27 conversion results with a resolution of 100 m were candidate variables in the modeling, as shown in Table 3 and Figure 3.

3.2.2. Preparation of the Observation Data

An observation database was prepared for the LR model to learn the data features, including suitable and unsuitable location points, with the values of their explanatory variables. The location points of PPLs were collected from the POI data from Gaode Map. This study assumed that ranges of 500 m around the existing locations of PPLs were suitable (approximate walking distance of 5 min) [17]. After erasing the water and assumed suitable areas, the non-PPL points were randomly sampled in the remained area. The classification by the LR model using ML algorithms should have avoided the class-imbalance problem [34]. In order to make the sample sizes of the positive and negative datasets similar, 690 non-PPL points were randomly selected. Figure 4 shows the locations of all the observation points.

Next, the values of all the observation points were extracted from the raster layers of 27 candidate variables to create the reference database. There were several points that extracted the null values from the raster layers. These abnormal points were neglected to reduce the model bias. Empirical studies showed that the best results were obtained by training and testing data with a ratio of 70:30 or 80:20 [35]. In order to employ more data to test the performance of the model, this study chose the ratio of 70:30. The data were randomly split into a training dataset and a test dataset.

3.2.3. Diagnosis of the Assumptions of LR Model

Before applying the LR model, it was necessary to examine the assumptions shown in Table 4. The data for modeling satisfied the requirements for the first four assumptions during the dataset design, but the last three had to be examined using other methods. Here, the diagnosis was conducted using Version 25 of the IBM SPSS statistics software.

Diagnosis of the linearity of independent variables and log-odds

The Box–Tidwell method was employed here. It incorporated the interaction term between the continuous independent variable and its natural logarithmic value into the regression equation [36]. First, the natural logarithms of all the continuous independent variables were calculated using the compute variable function in SPSS. Then, the interaction terms between the continuous independent variables and their logs were included in the binary LR analysis using SPSS. The statistical significance of this predictor suggested a non-linear logit. When the interaction term was statistically significant (p-value < 0.05), there was no linear relationship between the corresponding continuous independent variable and the logit conversion value of the dependent variable. It was recommended that all the items in the analysis (including the intercept term) be corrected using the Bonferroni method when testing the multiple significance of the linearity hypothesis [37]. In this study, 55 items were included in the model analysis: 27 continuous independent variables, 27 interaction terms with their independent variables and their natural logs, and the intercept term (constant). A p-value less than the corrected value (i.e., 0.05 ÷ 55 = 0.000091) was taken to indicate nonlinearity. There was no observed p-value less than the corrected value. Hence, linear relationships existed between all the continuous independent variables and the log-conversion value of the dependent variable.

Diagnosis of multicollinearity

A good LR model exhibits low noise and is statistically robust. It means that the explanatory variables are highly correlated with the dependent variable but minimally correlated with each other [38]. Multicollinearity occurred when explanatory variables exhibited strong correlations or associations with each other. When the degree of correlation was extremely high, the standard errors of the coefficients increased, which caused some variables to appear statistically insignificant in the results, even though they were significant. Multicollinearity made the coefficients unstable [39] and reduced the precision or interfered with the result when fitting the model [40]. This was mainly detected with the help of the tolerance (Tol) and reciprocal, called the variance inflation factor (VIF) [41]. The formulae are defined as follows:

{Tol = 1 - R}^{2}

(1)

VIF = \frac{1}{Tol}

(2)

where R² is the coefficient of determination for the regression of the explanatory variable on all the remaining independent variables.

VIF > 10 and Tol < 0.1 were common thresholds for assessing multicollinearity between explanatory variables [38,42]. There were several ways to address the multicollinearity problem. First, multiple variables with collinearity could be combined into a single variable. Second, the sample size could be increased to decrease standard errors. Third, some variables causing multicollinearity could be omitted from the model. Omitting some variables was the most direct, simple, and effective way. In order to retain as many variables as possible, the most correlated variable was neglected each time until the collinearity problem was not severe. Table 5 shows the VIF values of all the variables after omitting the variable with multicollinearity in the model.

Diagnosis of obvious outliers

An outlier is an exceptional value that is very different from the others in a dataset. The LR model is sensitive to outliers. The usual approach to detecting outliers is based on the values of standardized residuals. If its absolute value is larger than three, it is usually considered an outlier [36]. After deleting the outliers, model fitting was conducted for the training dataset of 961 samples.

3.2.4. Determination of the Best Model Using the Stepwise Methods

There were many candidate variables in the model. It was important to detect the best variable combination for model fitting. A good model should adequately fit the data, and the predictor variables should not be too complicated. It was challenging to select the smallest number of candidate variables that could predict the dependent variable sufficiently while considering sample size constraints [36]. The forward and backward stepwise methods were frequently applied in previous studies of the LR model [43].

The forward stepwise selection method (FSSM) selected several significant predictors for the final model. Model optimization was performed using the least-squares criteria. It started with a blank model with no predictors. Variables were sequentially added one at a time to an empty model to predict the best output variable. Subsequently, a second variable that could best improve the model fitting was sought. The process was continued until a stopping rule was satisfied. In FSSM, variables added early in the process could be removed at a later stage because they became unimportant when other variables were added to the model. FSSM used a systematic method for adding variables based on their statistical significance in a regression. The process started with no explanatory variables in the model and then compared the incremental explanatory power of larger models [44]. Using the FSSM technique, the variables could be ranked by importance according to the priority of the added variables.

Unlike FSSM, the backward stepwise elimination method (BSEM) started with all the predictors of the least-squares model and then eliminated the least effective predictors one at a time. This method was continued until a stopping rule was satisfied. In the literature, the recommended stopping rule was a p-value of ~0.15 [45,46]. In the SPSS software, the default values for FSSM and BSEM were 0.05 and 0.1, respectively.

3.2.5. Evaluation of the Model’s Performance

The performance of the LR models was evaluated based on their discrimination and calibration. Discrimination referred to the ability of the model to correctly distinguish between the two suitability classes based on prediction values. The capacity of discrimination was often measured using a confusion matrix and by calculating indices of classification performance [47]. The LR model used the logistic function to map the predictions to probabilities between 0 and 1. The default threshold of 0.5 was commonly used. It assumed that a PPL was present if the probability was above 0.5; otherwise, it was absent. The classification accuracy was determined by comparing the predictions with the real values. The classification table was divided into four types. True positives (TPs) and true negatives (TNs) indicated the number of correctly predicted PPLs and non-PPLs; false positives (FPs) and false negatives (FNs) denoted the numbers of incorrect predictions. Several further indications were used to measure the performance of a model or predictors. The accuracy was the total number of correct predictions divided by the total number of predictions made for a dataset. However, even unskillful models could show high accuracy scores when the class imbalance was severe. An alternative to using the classification accuracy was to use precision and recall. Unfortunately, precision and recall may sometimes contradict each other. The F-Measure (also known as the F-Score) was the most common method for balancing both indications in a single score. The mathematical basis was the same as in Equations (3)–(6). Here, the classification accuracy and F-Measure represented the index of the discrimination.

Precision = \frac{TP}{(TP + FP)}

(3)

Recall = \frac{TP}{(TP + FN)}

(4)

Accuracy = \frac{(TP + TN)}{(TP + FP + FN + TN)}

(5)

F - Measure = \frac{2 \times Presision \times Recall}{(Precision + Recall)}

(6)

The discrimination only compared the predicted probability value with a certain threshold of 0.5. However, it ignored how far the predicted value was from the true value. Calibration resolved this shortcoming, and it described how close the predicted value was to the actual value. The Brier score was an important calibration index that measured the accuracy of probabilistic predictions. It was applicable to tasks in which predictions assigned probabilities to a set of mutually exclusive discrete outcomes. The set of possible outcomes could be either binary or categorical in nature, and the probabilities assigned to this set of outcomes must have summed to 1, where each individual probability ranged from 0 to 1 [48]. The lower the Brier score for a set of predictions, the better the predictions were calibrated. In this study, the reduction ratio for the variables involved in modeling (the model optimization rate) was added to evaluate the model’s performance:

B = \frac{\sum_{i = 1}^{n} {{(x}_{i} {- q}_{i})}^{2}}{n}

(7)

where x is the real dependent variable, and q is the predicted probability.

The receiver operating characteristic (ROC) curve was also a popular method for testing a model’s accuracy and describing the quality of a probabilistic prediction system [49]. The area under the ROC curve (AUC) was a common metric for the level of discriminative ability; the larger the area, the better the performance of the model. The following classification using the AUC was considered for accuracy: 0.90–1 (excellent), 0.80–0.90 (good), 0.70–0.80 (fair), 0.60–0.70 (poor), and 0.50–0.60 (fail) [50,51].

3.2.6. Generation of the Suitability Map

The coefficient of the selected optimum variables and the constant of the best LR model was substituted into Equation (9). The suitability index of Equation (10) was applied in each raster of the whole study area for prediction. According to the classification threshold of the LR model, the suitability map for PPLs consisted of two categories. The raster with a predicted value between 0.5 and 1 was reclassified as a suitable area, and the raster with a value between 0 and 0.5 was reclassified as an unsuitable area.

Z = \sum_{i = 1}^{n} w_{i} x_{i} + Constant

(8)

y = \frac{1}{{1 + e}^{- (z)}}

(9)

4. Results

4.1. The Optimum Variable Combination for the Best Model

Table 6 shows the model’s performance with the combination of variables selected by the FSSM and BSEM. The discrimination and calibration of the BSEM are also slightly better than those of the FSSM. However, the optimization rate for the FSSM is 20% higher. It indicates that the two methods for selecting the optimal variable combination show a similar model accuracy and bias. In terms of the index of model optimization, the FSSM performed better than the BSEM. Table 7 shows the coefficient of the best explanatory variable combination as determined by the BSEM. The Wald value indicates the significance of the variables. Eight significant variables were selected from the 25 variables without multicollinearity. Among these eight variables, five were selected from the accessibility factors, and one each was selected from the social factors, topographic factors, and urban development factors. Among the five selected accessibility factors, three were from the variables of proximity to various types of buildings. According to the Wald value, the most crucial factor was Dist_Res_Qua, with a value of 45.5, followed by SLPrice (29), Dist_BusStop (28.4), and Dens_ComBs (20.7). According to the signs of the coefficients, the variables of Dist_Res_Quar, Dist_BusStop, Dist_Com_OffB, Dist_Road_Sec, Dist_Res_Vil, and SLPrice were negatively correlated with the suitability for PPLs in the raster unit. The DEM and Dens_ComB were positively correlated. Thus, a PPL site may be situated close to residential quarters, commercial offices, and residential villas. The areas were near bus stops or secondary roads with relatively low land prices, and in high-density zones of commercial buildings.

4.2. Evaluation of the Classification Performance

The test dataset was used to conduct an unbiased evaluation of the final model’s fit on the training dataset. The final LR model with the best variable combination and coefficients was applied to the test dataset. The F-measure, Brier score, and AUC were the indicators used to evaluate the model’s classification performance, as shown in Table 8. The larger the F-measure index, the higher the discrimination accuracy of the model’s classification. The F-Measure values for both the training and test data, were all greater than 89%. The lower the Brier score, the smaller the deviation predicted and the higher the calibration degree of the model. The Brier scores were less than 0.09. The value of the AUC for both datasets was between 0.9 and 1, indicating excellent accuracy.

Overall, the predicted performance of the final LR model was effective. Additionally, the performance with the test dataset was better than that with the training dataset.

4.3. The Boundaries of the Suitable Areas

Figure 5 demonstrates the suitability for PPLs simulated using the best LR model. The suitability for PPLs is divided into two classes: the suitable area in orange and the unsuitable area in blue. Most of the suitable areas are concentrated in the central districts and dispersed in small areas in the outer districts. Figure 6 summarizes the sizes and percentages of the suitable area by the district. Panyu district has the greatest suitable area, while Liwan district has the smallest. Yuexiu district has the greatest proportion of suitable area, more than 80%, while Conghua has the smallest, only 1%. Overall, the suitable area is appropriately 614 sq. km, accounting for 8% of the total area of GZ. The site-selection range for PPLs can focus on these suitable areas, which significantly reduces the difficulty of analysis and time costs.

5. Discussion

Big data make location analysis in a macro-scale area possible. POI data, an innovative data source with a low cost, can identify the existing locations of PPLs and other related facilities. Some studies used POI data to analyze the PPL distribution patterns in specific cities of China and found them to be strongly consistent with economic development levels, population density, and traffic convenience [21,23,24]. This study further developed a GIS-based LR classification model using an ML algorithm to identify suitable areas from bottom to top with massive, detailed data, which was different from previous studies conducted by the MCE approach. The optimum explanatory variables from the 27 candidates and their coefficients for LR models were determined using a training dataset with stepwise methods. The FSSM performed better than the BSEM in the optimization of variables. The most crucial variable was Dist_Res_Qua. It was much more important than the variables of the distance to various transport services/roads and the density of related points. This result was consistent with the preferences of customers for PPLs being located near their home addresses [52]. Furthermore, this study subdivided residential buildings into four types as candidate variables to analyze the relationships with PPLs. The results showed that the type of residential quarters was the most crucial variable; the types of dormitory and community center (CC) were not determining variables for the locations of PPLs. A CC is a place providing recreational, cultural, and social activities for surrounding groups of residential neighborhoods. Although a CC is usually near the residential building of the community, it is difficult to combine the behavior of picking up parcels with entertainment or social activities. The residential buildings of dormitories are mainly located in colleges, factories, or institutions with closed management. The dormitory areas are usually far from the entrance. It takes a long time to distribute parcels to PPLs near a dormitory building, and the delivery vehicles have limited accessibility. Due to the safety of internal personnel and the long delivery times, parcels for dormitories are generally signed for and stored by guards or shops which offer parcel-pickup services. The population in the dormitory area is dense, and the capacity of PPLs is limited. Due to the high machine cost, it is not possible to set up several facilities of PPLs to meet the great demand there. Moreover, the nature of PPLs is more inclined toward that of a public service facility, and their economic benefit is limited. The dormitory management prefers to lease the land to commercial shops rather than PPLs, to obtain more rent.

Another interesting finding was that the variable of population density was not selected as the critical factor for determining the locations of PPLs in the study. It was somewhat different from the previous studies that proposed that the density of PPPs had a strong positive correlation with population density [20,21]. The reason for this may be that the research scales used were different. The previous studies were based on the unit of the administrative boundary. According to the characteristics of the existing locations of PPPs, the relationship between the density of PPPs and the density of various factors in each administrative unit of the study area was investigated using correlation analysis [21,23,24]. The analysis focused mainly on the quantitative relationship and ignored the relationship with the location distances of various factors. For distance analysis, the statistical method was widely used to determine the distance range between the location of most PPPs and the surrounding features. Unlike previous studies, this work attempted to model the locations using raster units. The locations of existing PPLs and random non-PPL points were used as the training and testing datasets. The features of existing PPL locations were extracted by the ML algorithm and generated the best model. Other unknown raster units were classified into suitable and unsuitable sites by the model. This method considered both the number and distance-related factors, and the model could further identify locations suitable for PPLs rather than only analyzing the characteristics of existing points. The model could distinguish variables that yielded locations suitable for PPLs from a large number of candidate factors using Wald values (importance). Another reason was that the raster population data were not highly accurate and only considered the nighttime population. The population density data source used here was the population prediction data in the WorldPop dataset developed by the WorldPop Project. Up-to-date raster data for population density with a high resolution were hard to obtain. The predicted population in the WorldPop dataset was simulated from the official census population data and nighttime satellite images [53]. In the best model of this study, the most critical variables that yielded suitable PPL locations included Dis_Res_Qua, Dist_Busstop, and Dist_Com_OffB. These factors also had a strong relationship with the population.

There are several assumptions and limitations in this study due to the insufficient data. First, the existing PPL locations are considered as locations suitable for PPL. These locations of points serve as the sample for the ML algorithm, which learns their features. However, they may not be consistent with the actual suitability. Only current POI data are available, not historical POI data. It is impossible to analyze the relationship between the historical PPL locations and the surrounding environment. In addition, because PPL usage data are not available, it is not possible to determine whether the existing PPL locations are realistically appropriate. Second, the competition of PPLs was not considered in this study. In reality, PPLs are operated by different companies, and they may compete with each other. Third, the 27 candidate factors in the model are social and location-related factors; market and user-behavior preference factors are not included in this study.

Moreover, a metropolis is a large city consisting of a densely populated urban core and less-populated surrounding territories under the same administrative jurisdiction [54]. The PPL density is also unbalanced in different areas of a metropolis. Future research could divide metropolitan areas into multiple zones according to population density for modeling and further analyze the differences in the variables chosen by the model in the various zones.

6. Conclusions

Previous studies of SSA for PPLs commonly addressed the site-selection problem with given sites in a specific area [16,17]. Few studies have focused on the site-search problem with quantitative models. GIS-based SSA techniques were widely applied in urban planning activities with multiple factors. ML method was superior to the other two approaches of GIS-based SSA and worked best for problems involving enormous datasets. The LR model was the most common and explainable model of the data-driven ML algorithms. This paper proposed a GIS-based bivariate LR model with supervised classification algorithms for the SSA of PPLs and explicitly identified the boundaries of suitable areas. The micro-scale raster provided the basic unit of observation, and the suitability classification was conducted in each raster. The crucial factors and their weights were determined using the training data. Of the data, 30% was used to test the model’s accuracy and evaluate the performance of the best model. The two stepwise methods (FSSM and BSEM) were employed to determine the optimum combination of variables from a total of 27 candidate variables. The performance of the LR models was evaluated based on their discrimination, calibration, and optimization rates. The results indicated that the FSSM with fewer variables had an absolute advantage in model optimization. Although the BSEM selected more variables than the FSSM, there was only a slight improvement in other indicators.

From the 25 potential variables without multicollinearity, eight crucial variables were chosen by the final LR model. Three variables were the distances to various types of buildings. The proximity to residential buildings was more important than that to commercial buildings. The most crucial factor was the proximity to residential quarters, whose importance was twice that of land price and proximity to a bus stop. The result was consistent with the preferences of customers for PPLs being located near their home addresses [52]. This study further supported the idea that the residential quarter was the most important among the four types of residential buildings, while the dormitory and CC types were relatively unimportant. The final model identified the boundaries of areas suitable for PPLs, accounting for 8% of the total area of GZ. The site-selection ranges for PPLs could be focused on these areas, which significantly reduced the difficulty of analysis and time costs. There were several limitations in this study due to the insufficient data. Future research should divide metropolitan areas into multiple zones for modeling and analyze the differences in the variables chosen by the model in the various zones.

Author Contributions

Conceptualization, Zilai Zheng, Takehiro Morimoto and Yuji Murayama; methodology, Zilai Zheng, Takehiro Morimoto and Yuji Murayama; software, Zilai Zheng; validation, Zilai Zheng; formal analysis, Zilai Zheng; investigation, Zilai Zheng; resources, Zilai Zheng; data curation, Zilai Zheng; writing—original draft preparation, Zilai Zheng; writing—review and editing, Takehiro Morimoto and Yuji Murayama; visualization, Zilai Zheng; supervision, Takehiro Morimoto and Yuji Murayama. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partly supported by the JSPS grants of 21K01027 and 18H00763.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available from the author upon reasonable request.

Acknowledgments

The comments and suggestions of the anonymous reviewers are gratefully acknowledged.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gevaers, R.; van de Voorde, E.; Vanelslander, T. Characteristics and typology of last-mile logistics from an innovation perspective in an urban context. In City Distribution and Urban Freight Transport: Multiple Perspectives; Macharis, C., Melo, S., Eds.; Edward Elgar Publishing: Cheltenham, UK, 2011; pp. 56–71. [Google Scholar] [CrossRef] [Green Version]
Punakivi, M.; Yrjölä, H.; Holmström, J. Solving the last mile issue: Reception box or delivery box. Int. J. Phys. Distrib. Logist. Manag. 2001, 31, 427–439. [Google Scholar] [CrossRef] [Green Version]
Xiao, Z.; Wang, J.J.; Lenzer, J.; Sun, Y. Understanding the diversity of final delivery solutions for online retailing: A case of Shenzhen, China. Transp. Res. Procedia 2017, 25, 985–998. [Google Scholar] [CrossRef]
Slabinac, M. Innovative solutions for a “Last-Mile” delivery—A European experience. In Proceedings of the 15th International Scientific Conference Business Logistics in Modern Management, Osijek, Croatia, 15 October 2015; pp. 111–130. [Google Scholar]
Edwards, J.; McKinnon, A.; Cherrett, T.; McLeod, F.; Song, L. Carbon dioxide benefits of using collection–delivery points for failed home deliveries in the United Kingdom. Transp. Res. Rec. 2010, 2191, 136–143. [Google Scholar] [CrossRef] [Green Version]
Gevaers, R.; van de Voorde, E.; Vanelslander, T. Cost modelling and simulation of last-mile characteristics in an innovative B2C supply chain environment with implications on metropolitan areas and cities. Procedia Soc. Behav. Sci. 2014, 125, 398–411. [Google Scholar] [CrossRef] [Green Version]
Kedia, A.; Kusumastuti, D.; Nicholson, A. Acceptability of collection and delivery points from consumers’ perspective: A qualitative case study of Christchurch city. Case Stud. Transp. Policy 2017, 5, 587–595. [Google Scholar] [CrossRef]
Rautela, H.; Janjevic, M.; Winkenbach, M. Investigating the financial impact of collection-and-delivery points in last-mile E-commerce distribution. Res. Transp. Bus. Manag. 2021, 100681, in press. [Google Scholar] [CrossRef]
Van Duin, J.H.R.; Wiegmans, B.W.; van Arem, B.; van Amstel, Y. From home delivery to parcel lockers: A case study in Amsterdam. Transp. Res. Procedia 2020, 46, 37–44. [Google Scholar] [CrossRef]
Weltevreden, J.W. B2c e-commerce logistics: The rise of collection-and-delivery points in The Netherlands. Int. J. Ret. Distrib. Manag. 2008, 36, 638–660. [Google Scholar] [CrossRef]
State Post Bureau of The People’s Republic of China. 2017. Available online: http://www.spb.gov.cn/zc/ghjbz_1/201702/t20170213_991162.html (accessed on 11 September 2021).
Collins, M.G.; Steiner, F.R.; Rushman, M.J. Land-use suitability analysis in the United States: Historical development and promising technological achievements. Environ. Manag. 2001, 28, 611–621. [Google Scholar] [CrossRef]
Cova, T.J.; Church, R.L. Exploratory spatial optimization in site search: A neighborhood operator approach. Comput. Environ. Urban Syst. 2000, 24, 401–419. [Google Scholar] [CrossRef]
Hopkins, L. Methods for generating land suitability maps: A comparative evaluation. J. Am. Inst. Plann. 1997, 34, 19–29. [Google Scholar] [CrossRef]
Malczewski, J. GIS-based land-use suitability analysis: A critical overview. Prog. Plann. 2004, 62, 3–65. [Google Scholar] [CrossRef]
Yang, G.; Huang, Y.; Fu, Y.; Huang, B.; Sheng, S.; Mao, L.; Huang, S.; Xu, Y.; Le, J.; Ouyang, Y.; et al. Parcel locker location based on a Bilevel programming model. Math. Probl. Eng. 2020, 2020. [Google Scholar] [CrossRef]
Zheng, Z.; Morimoto, T.; Murayama, Y. Optimal location analysis of delivery parcel-pickup points using AHP and Network Huff Model: A case study of Shiweitang Sub-District in Guangzhou City, China. ISPRS Int. J. Geoinf. 2020, 9, 193. [Google Scholar] [CrossRef] [Green Version]
Lachapelle, U.; Burke, M.; Brotherton, A.; Leung, A. Parcel locker systems in a car dominant city: Location, characterisation and potential impacts on city planning and consumer travel access. J. Transp. Geogr. 2018, 71, 1–14. [Google Scholar] [CrossRef]
Liu, S.; Liu, Y.; Zhang, R.; Cao, Y.; Li, M.; Zikirya, B.; Zhou, C. Heterogeneity of Spatial Distribution and Factors Influencing Unattended Locker Points in Guangzhou, China: The Case of Hive Box. ISPRS Int. J. Geoinf. 2021, 10, 409. [Google Scholar] [CrossRef]
Morganti, E.; Dablanc, L.; Fortin, F. Final deliveries for online shopping: The deployment of pickup point networks in metropolitan and suburban areas. Res. Transp. Bus. Manag. 2014, 11, 23–31. [Google Scholar] [CrossRef] [Green Version]
Xue, S.; Li, G.; Yang, L.; Liu, L.; Nie, Q.; Mehmood, M.S. Spatial Pattern and Influencing Factor Analysis of Attended Collection and Delivery Points in Changsha City, China. Chin. Geogr. Sci. 2019, 29, 1078–1094. [Google Scholar] [CrossRef] [Green Version]
Lin, L.; Han, H.; Yan, W.; Nakayama, S.; Shu, X. Measuring Spatial Accessibility to Pick-Up Service Considering Differentiated Supply and Demand: A Case in Hangzhou, China. Sustainability 2019, 11, 3448. [Google Scholar] [CrossRef] [Green Version]
Li, G.; Chen, W.; Yang, L. Spatial pattern and agglomeration mode of parcel collection and delivery points in Wuhan City. Prog. Geogr. 2019, 38, 407–416. (In Chinese) [Google Scholar]
Li, G.; Yang, L.; He, J. The spatial pattern and organization relation of the pickup points based on POI data in Xi’an: Focus on Cainiao stations. Sci. Geogr. Sin. 2018, 38, 2024–2030. (In Chinese) [Google Scholar]
Derdouri, A.; Murayama, Y. Onshore Wind Farm Suitability Analysis Using GIS-based Analytic Hierarchy Process: A Case Study of Fukushima Prefecture, Japan. Geoinf. Geostat. Overv. 2018. [Google Scholar] [CrossRef]
Estoque, R.C.; Murayama, Y. Suitability analysis for beekeeping sites in La Union, Philippines, using GIS and multi-criteria evaluation techniques. Res. J. Appl. Sci. 2010, 5, 242–253. [Google Scholar] [CrossRef]
Kumar, M.; Shaikh, V.R. Site suitability analysis for urban development using GIS based multicriteria evaluation technique. J. Indian Soc. Remote Sens. 2013, 41, 417–424. [Google Scholar] [CrossRef]
Saha, S.; Sarkar, D.; Mondal, P.; Goswami, S. GIS and multi-criteria decision-making assessment of sites suitability for agriculture in an anabranching site of sooin river, India. J. Adv. Model. Earth Syst. 2021, 7, 571–588. [Google Scholar] [CrossRef]
Store, R.; Kangas, J. Integrating spatial multi-criteria evaluation and expert knowledge for GIS-based habitat suitability modelling. Landsc. Urban Plann. 2001, 55, 79–93. [Google Scholar] [CrossRef]
Pereira, J.M.; Duckstein, L. A multiple criteria decision-making approach to GIS-based land suitability evaluation. Int. J. Geogr. Inf. Sci. 1993, 7, 407–424. [Google Scholar] [CrossRef]
Lodwick, W.A.; Monson, W.; Svoboda, L. Attribute error and sensitivity analysis of map operations in geographical information systems: Suitability analysis. J. Geogr. Inf. Sci. 1990, 4, 413–428. [Google Scholar] [CrossRef]
Dreiseitl, S.; Ohno-Machado, L. Logistic regression and artificial neural network classification models: A methodology review. J. Biomed. Inf. 2002, 35, 352–359. [Google Scholar] [CrossRef] [Green Version]
State Post Bureau of The People’s Republic of China. Statistical Communique on the Development of Postal Industry in 2014–2020. Available online: http://www.spb.gov.cn/sj/tjxx_1/ (accessed on 11 September 2021).
Oommen, T.; Baise, L.G.; Vogel, R.M. Sampling bias and class imbalance in maximum-likelihood logistic regression. Math. Geosci. 2011, 43, 99–120. [Google Scholar] [CrossRef]
Gholamy, A.; Kreinovich, V.; Kosheleva, O. Why 70/30 or 80/20 relation between training and testing sets: A pedagogical explanation. Int. J. Intell. Syst. 2018, 11, 105–111. [Google Scholar]
Hosmer, D.W.; Lemeshow, S. Applied Logistic Regression, 2nd ed.; John Wiley and Sons: New York, NY, USA, 2000. [Google Scholar]
Bland, J.M.; Altman, D.G. Multiple significance tests: The Bonferroni method. Br. Med. J. 1995, 310, 170. [Google Scholar] [CrossRef] [Green Version]
Midi, H.; Sarkar, S.K.; Rana, S. Collinearity diagnostics of binary logistic regression model. J. Interdiscip. Math. 2010, 13, 253–267. [Google Scholar] [CrossRef]
Belsley, D.; Kuh, E.; Welsch, R. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, 2nd ed.; John Wiley and Sons: New York, NY, USA, 2013. [Google Scholar]
Schroeder, M.A.; Lander, J.; Levine-Silverman, S. Diagnosing and dealing with multicollinearity. West. J. Nurs. Res. 1990, 12, 175–187. [Google Scholar] [CrossRef] [PubMed]
Mansfield, E.R.; Helms, B.P. Detecting multicollinearity. Am. Stat. 1982, 36, 158–160. [Google Scholar] [CrossRef]
Kroll, C.N.; Song, P. Impact of multicollinearity on small sample hydrologic regression models. Water Resour. Res. 2013, 49, 3756–3769. [Google Scholar] [CrossRef]
Zellner, D.; Keller, F.; Zellner, G.E. Variable selection in logistic regression models. Commun. Stat. Simul. Comput. 2004, 33, 787–805. [Google Scholar] [CrossRef]
Soroush, A.; Bahreininejad, A.; van den Berg, J. A hybrid customer prediction system based on multiple forward stepwise logistic regression model. Intell. Data Anal. 2012, 16, 265–278. [Google Scholar] [CrossRef] [Green Version]
Flack, V.F.; Chang, P.C. Frequency of selecting noise variables in subset regression analysis: A simulation study. Am. Stat. 1987, 41, 84–86. [Google Scholar] [CrossRef]
Lee, K.I.; Koval, J.J. Determination of the best significance level in forward stepwise logistic regression. Commun. Stat. Simul. Comput. 1997, 26, 559–575. [Google Scholar] [CrossRef]
Pearce, J.; Ferrier, S. Evaluating the predictive performance of habitat models developed using logistic regression. Ecol. Modell. 2000, 133, 225–245. [Google Scholar] [CrossRef] [Green Version]
Brier, G.W. Verification of forecasts expressed in terms of probability. Mon. Weather Rev. 1950, 78, 1–3. [Google Scholar] [CrossRef]
Swets, J.; Pickett, R.; Whitehead, S.; Getty, D.; Schnur, J.; Swets, J.; Freeman, B. Assessment of diagnostic technologies. Science 1979, 205, 753–759. [Google Scholar] [CrossRef] [PubMed]
Hanley, J.A.; McNeil, B.J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982, 143, 29–36. [Google Scholar] [CrossRef] [Green Version]
Maxim, L.D.; Niebo, R.; Utell, M.J. Screening tests: A review with examples. Inhal. Toxicol. 2014, 26, 811–828. [Google Scholar] [CrossRef]
Iwan, S.; Kijewska, K.; Lemke, J. Analysis of parcel lockers’ efficiency as the last mile delivery solution—The results of the research in Poland. Transp. Res. Procedia 2016, 12, 644–655. [Google Scholar] [CrossRef] [Green Version]
Gaughan, A.E.; Stevens, F.R.; Huang, Z.; Nieves, J.J.; Sorichetta, A.; Lai, S.; Ye, X.; Linard, C.; Hornby, G.M.; Hay, S.I.; et al. Spatiotemporal patterns of population in mainland China, 1990 to 2010. Sci. Data 2016, 3, 1–11. [Google Scholar] [CrossRef]
Squires, G.D. Urban sprawl and the uneven development of metropolitan America. In Urban Sprawl: Causes, Consequences, and Policy Responses; Urban Institute Press: Washington, DC, USA, 2002; pp. 1–22. [Google Scholar]

Figure 1. The study area.

Figure 2. Methodological framework. Note: ① conversion of multi-source data to the same scale; ② preparation of the observation data; ③ diagnosis of the assumptions of LR model; ④ determination of the best model; ⑤ evaluation of the model performance; and ⑥ generation of the suitability map using the best model.

Figure 3. The distribution maps of the 27 candidate variables used in the model.

Figure 4. The locations of PPL and non-PPL points.

Figure 5. Suitability map for PPLs using the best LR model.

Figure 6. Summary of the suitable areas for PPLs.

Table 1. List of data used.

Layer	Description	Source	Data Type
Road Network		OSM (2019) https://www.openstreetmap.org/ (accessed on 20 December 2019)	Vector (line)
POI		Gaode Maps (2019) https://ditu.amap.com/ (accessed on 20 December 2019)	Vector (point)
DEM	DEM-GDEMV2 30 m	ASTER GDEM Project (2019) https://www.gscloud.cn/ (accessed on 20 December 2019)	Raster
Population	Resolution of 100 m	WorldPop Project (2019) https://www.worldpop.org/geodata/summary?id=6275 (accessed on 11 September 2021)	Raster
Standard Land Price (Housing)	12 Levels of Price	Guangzhou Municipal Planning and Natural Resources Bureau 2019	Vector (polygon)

Table 2. List of POI data used.

Big Category	Mid Category	Subcategory	Number
Commercial House	Building	Business Office Building	5658
	Building	Commercial-residential Building	825
	Residential Area	Villa	280
		Residential Quarter	7619
		Dormitory	2031
		Community Center	353
Transportation Service	Subway Station	Exit	808
	Bus Station	Bus Station Related (The bus stops for the airport bus or stopping operation were not considered.)	6778
	Parking Lot	Parking Lot Related	9882

Table 3. The abbreviations of the 27 candidate variables.

No.	Potential Explanatory Variable	Variable Code	Type
X1	DEM	DEM	Topographic factors
X2	Slope	Slope	Topographic factors
X3	Population density	POP	Social factors
X4	Standard land price	SLPrice	Social factors
X5	Euclidean distance to the nearest residential quarter	Dist_Res_Qua	Accessibility factors: Proximity to various types of building
X6	Euclidean distance to the nearest residential community center	Dist_Res_CC
X7	Euclidean distance to the nearest residential villa	Dist_Res_Vil
X8	Euclidean distance to the nearest residential dormitory	Dist_Res_Dor
X9	Euclidean distance to the nearest commercial and residential building	Dist_Com_ResB
X10	Euclidean distance to the nearest commercial office building	Dist_Com_OffB
X11	Euclidean distance to the nearest primary road	Dist_Road_Pri	Accessibility factors: Proximity to various types of road
X12	Euclidean distance to the nearest secondary road	Dist_Road_Sec
X13	Euclidean distance to the nearest tertiary road	Dist_Road_Ter
X14	Euclidean distance to the nearest unclassified road	Dist_Road_Unc
X15	Euclidean distance to the nearest residential road	Dist_Road_Res
X16	Euclidean distance to the nearest special type of road	Dist_Road_Spe
X17	Euclidean distance to the nearest path road	Dist_Road_Path
X18	Euclidean distance to the nearest metro exit	Dist_MetroExit	Accessibility factors: Proximity to various types of transport
X19	Euclidean distance to the nearest bus stop	Dist_BusStop
X20	Euclidean distance to the nearest parking lot	Dist_ParkingLot
X21	Euclidean distance to the nearest water area	Dist_WaterArea
X22	Kernel density of parking lot	Dens_ParkingLot	Urban development factors: Density of various types of POI
X23	Kernel density of metro exit	Dens_MetroExit
X24	Kernel density of bus stop	Dens_BusStop
X25	Kernel density of commercial building	Dens_ComB
X26	Kernel density of residential building	Dens_ResB
X27	Kernel density of road	Dens_Road

Table 4. Assumptions of the LR model.

No	Assumptions	Explanation	Examination
1	Dependent variable is required to be a binary variable.	1: PPL presence; 0: PPL absence.	Y
2	Observations were required to be independent of each other.	The observations come from different measurements or matched data.	Y
3	There is at least one dependent variable. The independent variable can be a continuous variable or a categorical variable.	There is one dependent variable and 27 independent variables.	Y
4	A large size of the sample is required. In general, the minimum sample quantity should be more than ten times the number of the independent variables.	There are 1205 points of PPL data. The sample quantity is more than 270.	Y
5	The linearity of independent variables and log odds is assumed.	Box–Tidwell method	?
6	There is little or no multicollinearity among the independent variables.	Multicollinearity diagnosis	?
7	There are no obvious outliers.		?

Note: In the examination column, ‘Y’ indicates that the assumption met the requirement and ‘?’ indicates that the assumption needed to be verified.

Table 5. VIF values of all variables after omitting the variable with the multicollinearity problem.

		Step0		Step1		Step2
No.	Variable	Tol	VIF	Tol	VIF	Tol	VIF
X1	DEM	0.316	3.164	0.316	3.163	0.316	3.161
X2	Slope	0.594	1.685	0.594	1.683	0.595	1.681
X3	POP	0.682	1.466	0.692	1.445	0.692	1.444
X4	SLPrice	0.165	6.044	0.166	6.022	0.203	4.918
X5	Dist_Res_Qua	0.142	7.021	0.143	6.984	0.143	6.969
X6	Dist_Res_CC	0.292	3.419	0.293	3.419	0.296	3.376
X7	Dist_Res_Vil	0.533	1.878	0.533	1.877	0.557	1.797
X8	Dist_Res_Dor	0.183	5.475	0.183	5.458	0.183	5.454
X9	Dist_Com_ResB	0.151	6.637	0.151	6.626	0.154	6.49
X10	Dist_Com_OffB	0.14	7.142	0.14	7.13	0.14	7.128
X11	Dist_Road_Pri	0.452	2.212	0.458	2.182	0.458	2.181
X12	Dist_Road_Sec	0.305	3.282	0.307	3.256	0.309	3.241
X13	Dist_Road_Ter	0.375	2.664	0.377	2.65	0.378	2.643
X14	Dist_Road_Unc	0.559	1.788	0.56	1.786	0.56	1.785
X15	Dist_Road_Res	0.424	2.359	0.425	2.356	0.425	2.353
X16	Dist_Road_Spe	0.326	3.066	0.327	3.062	0.328	3.052
X17	Dist_Road_Path	0.32	3.123	0.324	3.091	0.325	3.079
X18	Dist_MetroExit	0.155	6.452	0.155	6.452	0.159	6.271
X19	Dist_BusStop	0.38	2.63	0.381	2.623	0.382	2.618
X20	Dist_ParkingLot	0.11	9.098	0.11	9.096	0.11	9.095
X21	Dist_WaterArea	0.693	1.444	0.695	1.438	0.698	1.433
X22	Dens_ParkingLot	0.139	7.196	0.173	5.769	0.175	5.704
X23	Dens_MetroExit	0.097	10.286	0.097	10.283	Omitted
X24	Dens_BusStop	0.098	10.235	0.108	9.241	0.114	8.756
X25	Dens_ComB	0.16	6.251	0.178	5.632	0.211	4.741
X26	Dens_ResB	0.086	11.685	Omitted
X27	Dens_Road	0.106	9.401	0.11	9.087	0.119	8.395

Table 6. Performance of FSSM and BSEM with the training dataset.

Method	Discrimination		Calibration	Optimization
Method	Accuracy	F-Measure	Brier Score	Reduction Ratio
FSSM	88.20%	88.50%	0.088	68.00%
BSEM	88.40%	88.70%	0.085	48.00%

Table 7. Coefficient of the best explanatory variable combination in the standard LR model.

Variable Type	Variable Code	Selected	Coefficient	Wald
Topographic Factors	DEM	Y	0.019	10.5
Topographic Factors	Slope	N	-	-
Social Factors	POP	N	-	-
Social Factors	SLPrice	Y	−0.0001	29
Accessibility Factors: Proximity to various types of building	Dist_Res_Qua	Y	−0.0032	45.5
	Dist_Res_CC	N	-	-
	Dist_Res_Vil	Y	−0.0002	6.7
	Dist_Res_Dor	N	-	-
	Dist_Com_ResB	N	-	-
	Dist_Com_OffB	Y	−0.0013	18.6
Accessibility Factors: Proximity to various types of road	Dist_Road_Pri	N	-	-
	Dist_Road_Sec	Y	−0.0006	9.3
	Dist_Road_Ter	N	-	-
	Dist_Road_Unc	N	-	-
	Dist_Road_Res	N	-	-
	Dist_Road_Spe	N	-	-
	Dist_Road_Path	N	-	-
Accessibility Factors: Proximity to various types of transport	Dist_MetroExit	N	-	-
	Dist_BusStop	Y	−0.0039	28.4
	Dist_ParkingLot	N	-	-
	Dist_WaterArea	N	-	-
Urban development Factors: Density of various types of POI	Dens_ParkingLot	N	-	-
	Dens_BusStop	N	-	-
	Dens_ComB	Y	0.090	20.7
	Dens_Road	N	-	-

Table 8. Predicted performance of the best model.

	F-Measure	Brier Score	AUC
Training dataset	89.11%	0.088	0.954
Test dataset	91.69%	0.069	0.963

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zheng, Z.; Morimoto, T.; Murayama, Y. A GIS-Based Bivariate Logistic Regression Model for the Site-Suitability Analysis of Parcel-Pickup Lockers: A Case Study of Guangzhou, China. ISPRS Int. J. Geo-Inf. 2021, 10, 648. https://doi.org/10.3390/ijgi10100648

AMA Style

Zheng Z, Morimoto T, Murayama Y. A GIS-Based Bivariate Logistic Regression Model for the Site-Suitability Analysis of Parcel-Pickup Lockers: A Case Study of Guangzhou, China. ISPRS International Journal of Geo-Information. 2021; 10(10):648. https://doi.org/10.3390/ijgi10100648

Chicago/Turabian Style

Zheng, Zilai, Takehiro Morimoto, and Yuji Murayama. 2021. "A GIS-Based Bivariate Logistic Regression Model for the Site-Suitability Analysis of Parcel-Pickup Lockers: A Case Study of Guangzhou, China" ISPRS International Journal of Geo-Information 10, no. 10: 648. https://doi.org/10.3390/ijgi10100648

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A GIS-Based Bivariate Logistic Regression Model for the Site-Suitability Analysis of Parcel-Pickup Lockers: A Case Study of Guangzhou, China

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Study Area and Data

3.2. Methodology

3.2.1. Conversion of the Multi-Source Data to the Same Scale

3.2.2. Preparation of the Observation Data

3.2.3. Diagnosis of the Assumptions of LR Model

3.2.4. Determination of the Best Model Using the Stepwise Methods

3.2.5. Evaluation of the Model’s Performance

3.2.6. Generation of the Suitability Map

4. Results

4.1. The Optimum Variable Combination for the Best Model

4.2. Evaluation of the Classification Performance

4.3. The Boundaries of the Suitable Areas

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI