Next Article in Journal
Geospatial Web Services Discovery through Semantic Annotation of WPS
Previous Article in Journal
Fine Crop Classification Based on UAV Hyperspectral Images and Random Forest
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Automatic Positioning of Street Objects Based on Self-Adaptive Constrained Line of Bearing from Street-View Images

1
Key Laboratory of Virtual Geographic Environment, Ministry of Education, Nanjing Normal University, Nanjing 210023, China
2
State Key Laboratory Cultivation Base of Geographical Environment Evolution (Jiangsu Province), Nanjing 210023, China
3
Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2022, 11(4), 253; https://doi.org/10.3390/ijgi11040253
Submission received: 19 February 2022 / Revised: 26 March 2022 / Accepted: 7 April 2022 / Published: 12 April 2022

Abstract

:
In order to realize the management of various street objects in smart cities and smart transportation, it is very important to determine their geolocation. Current positioning methods of street-view images based on mobile mapping systems (MMSs) mainly rely on depth data or image feature matching. However, auxiliary data increase the cost of data acquisition, and image features are difficult to apply to MMS data with low overlap. A positioning method based on threshold-constrained line of bearing (LOB) overcomes the above problems, but threshold selection depends on specific data and scenes and is not universal. In this paper, we propose the idea of divide–conquer based on the positioning method of LOB. The area to be calculated is adaptively divided by the driving trajectory of the MMS, which constrains the effective range of LOB and reduces the unnecessary calculation cost. This method achieves reasonable screening of the positioning results within range without introducing other auxiliary data, which improves the computing efficiency and the geographic positioning accuracy. Yincun town, Changzhou City, China, was used as the experimental area, and pole-like objects were used as research objects to test the proposed method. The results show that the 6104 pole-like objects obtained through object detection realized by deep learning are mapped as LOBs, and high-precision geographic positioning of pole-like objects is realized through region division and self-adaptive constraints (recall rate, 93%; accuracy rate, 96%). Compared with the existing positioning methods based on LOB, the positioning accuracy of the proposed method is higher, and the threshold value is self-adaptive to various road scenes.

1. Introduction

Roads are important parts of cities. Street objects on both sides of the road are an important part urban infrastructure management, intelligent transportation system construction, and unmanned high-precision maps [1,2]. Achieving fast and accurate collection of street objects has become an important task for the digital construction of cities and traffic, as well as the realization of automatic driving. The geolocation and attributes of street objects are important indicators for collection, especially location information, which is the basis of the street object and one of the most important factors. The location of street objects can assist in road asset management [3] and calculation of the safety risk index to evaluate road safety [4]. The presence of appropriate street objects in the right places can effectively reduce traffic risks, for example, by placing easy-to-read guide signs on curves [5]. Therefore, it is very important to perform effective geolocation and attribute acquisition of street objects.
In order to obtain geolocation and attribute information of the street objects on both sides of a road, an effective data collection and extraction method is required. Current data collection methods include manual field measurement, aerial remote sensing images, and mobile mapping systems (MMS). Manual field measurement requires a large number of professionals to conduct external measurements, and the labor cost is relatively high. Aerial remote sensing images observe the road surface from a top-down perspective, which can better collect large-area objects, such as road markings [6]. However, the orthographic projection area of narrow and vertical objects on both sides of the road is small and is therefore difficult to capture, lacks local details [7], and is affected by high-rise buildings and trees on both sides of the road. Regarding MMS, which can observe street objects on both sides of the road from a side view, the observation results are more in line with the visual psychology of people observing the objects and are more easily extracted [8]. MMS-based measurement relies on the system being equipped with a high-precision global navigation satellite system (GNSS) to realize positioning, a high-frequency inertial measurement unit (IMU) to realize attitude determination, a high-resolution camera to realize street-view image shooting, and a high-speed laser to realize distance measurement of street objects [9]. As a measurement device, LIDAR can extract the precise location of street objects from point cloud data obtained by scanning, although the cost is high [10,11]. Compared with images, point cloud is more challenging in semantic segmentation, especially for complex scenes where the technology is immature. Therefore, based on a camera mounted on the system, with the help of current, relatively mature image semantic analysis [12] and object-detection methods [13], the localization of street objects based on multi-view street-view images has become a lower-cost alternative [14,15].
The traditional object-positioning method based on multi-view street-view images acquired by MMS relies on visual matching. According to the image features, corresponding points are matched, and positioning is realized by geometric constraints generated by the corresponding points. Chang et al. demonstrated the feasibility of object positioning based on multi-view street-view images by manually matching corresponding points [16]. Nassar takes the camera location, camera spacing, and the heading angle of the target obtained by the MMS system as input parameters, applies geometric soft constraints to the Siamese convolutional neural network, and relies on the matching objects in multiple views to achieve triangulation positioning [17]. Ogawa proposed the use of a map and the location of buildings in images to correct the camera-position parameters of the captured image, thereby improving the object-recognition accuracy and geographic positioning accuracy of the image [18]. Zhu et al. performed street-to-aerial image matching based on an improved Siamese convolutional neural network to estimate the geolocation and orientation of targets in the street-view images [19]. When trying to automatically match corresponding points, due to the similarity of objects and backgrounds in multi-view images, it is difficult to automatically distinguish similar objects in the same background [20]. Multi-view images acquired by MMS have fewer overlapping areas, and it is difficult to achieve satisfactory results in visual matching using image object key points or descriptors [21,22]. The objects in images acquired by MMS have similar visual features of the same type and different instances and different visual features of the same instance and different perspectives, which makes multi-view visual matching difficult.
In order to resolve the difficulty of visual matching of multi-view images acquired by MMS, scholars have tried to transform the visual matching positioning problem into a passive positioning problem [23,24,25]. Firstly, the object of interest is detected from the multi-view images, and then the orientation of the object of interest relative to the shooting position is calculated according to the pose parameters. The orientation is then represented using line of bearing (LOB), and finally, the possible location of the object of interest is calculated by LOB intersection. Chu et al. proposed a deep-learning-based method for determining the orientation of objects in images [26]. This method provides a new form of LOB orientation acquisition and correction, as well as improvement in the positioning accuracy based on the LOB method. Hazelhoff et al. used the intersection and aggregation center of object LOBs in multiple views as the positioning result, although the result may contain a large number of ghost nodes [27,28]. Based on the Markov random field optimization method, Krylov et al. introduced the distance data of objects into decision making to reduce ghost nodes, although the object-positioning results often had considerable randomness [21]. Nassar integrates object detection and depth estimation into an end-to-end graph neural network and relies on the estimated depth information to determine the operating distance of the LOB generated by the object-detection result to achieve geographic positioning [20]. Similarly, Lumnitz et al. applied a monocular depth-estimation algorithm and triangulation to Google Street View and Mapillary and gathered adjacent LOB intersections into a cluster to realize meter-level geographic positioning of urban trees [29]. These methods all introduce depth information, the accuracy of which the positioning accuracy depends on. Zhang proposed a method of modified brute-force-based LOB measurement, which reduces the influence of ghost nodes on the positioning results of utility poles without introducing other data and can obtain stable positioning results [30]. Khan realized the geographic positioning of eucalyptus trees on both sides of a road based on the modified brute-force-based LOB measurement, which verified the feasibility of this method once again [31]. However, most of the current LOB-based positioning methods require a large number of threshold constraints, and it is difficult to select an appropriate threshold value. For example, the modified brute-force-based LOB measurement needs to consider the width of the road and other factors, which is difficult to apply to object extraction on both sides of a road with a wide range and different widths.
Aiming at the applicability of threshold selection for different road scenes based on LOB object positioning, in this paper, we propose an automatic positioning method for street objects based on LOB with adaptive constraints. This method can automatically divide the calculation area into grids based on the calibrated effective collection distance and driving trajectory, which constrains the effective range of LOB intersection calculation. According to the relationship between the LOB and the intersection point, two constrained rules independent of the threshold are proposed to further eliminate the ghost nodes generated by LOB intersection. The reserved calculation results are automatic positioning results of street objects. The algorithm presented in this paper is not affected by factors such as road width, and it is suitable for different road scenes. The adaptive selection of the LOB constraint threshold is realized by the effective shooting distance of the MMS images and the driving trajectory, which has universal applicability and generalizability.
The remainder of this paper is arranged as follows. Section 2 introduces the implementation of the proposed method and expounds the basic principle. Section 3 introduces Yincun town, Changzhou City, Jiangsu Province, China, as the experimental area, with street-view images of the area collected by MMS as the experimental data, and we introduce the experimental process and results in detail. Then, we compare and analyze the automatic positioning method of street objects based on the self-adaptive constraint LOB proposed in this paper and the modified brute-force-based LOB positioning method. Different thresholds are illustrated and discussed. Finally, the full text is summarized.

2. Methodology

In this paper, we propose an automatic positioning method for objects in street-view images with self-adaptive constraint LOB, which mainly includes two parts: LOB mapping based on object-detection results and LOB-based geographic positioning. The specific process is shown in Figure 1. LOB mapping based on object-detection results includes object detection marked by a bounding box and simulation of line of sight based on LOBs. LOB-based geographic positioning includes grid division based on driving trajectory, acquisition of intersection points based on LOBs, and elimination of ghost nodes by constrained rules. In the LOB-based geographic positioning process, the grid division based on driving trajectory proposed in this paper improves computational efficiency. It also provides self-adaptive constraints for eliminating ghost nodes. It works with other constraints to improve the positioning accuracy of street objects.

2.1. LOB Mapping Based on Object-Detection Results

In order to realize image-based object positioning, it is necessary to detect the object in the image. There are relatively mature image object-detection algorithms that have achieved good detection results in street object detection [11,32,33]. In this paper, a cascade region convolutional neural network (cascade R-CNN) is used to realize object detection. As shown in Figure 2, the model extracts the features of the input image through backbone convolutions. It uses the region proposal network (RPN) to obtain a series of rough rectangular proposals of the object. A series of end-to-end subdetectors is cascaded, and the bounding boxes output by the previous stage detector are input into the latter stage detector. This gradually increases the threshold of intersection over union (IoU) between the candidate bound and ground-truth bound to improve detection results [34]. Compared with other R-CNN series models, cascade R-CNN introduces detection sub-networks with different frameworks. It overcomes the overfitting of regression at a specific IoU threshold and can achieve relatively good recognition accuracy [34]. The objects to be detected are marked in the images with bounding boxes, and cascade R-CNN is used for training and learning so that the same type of objects in other images can be automatically marked with bounding boxes.
As shown in Figure 3, the image space coordinates ( x c , y c , z c ) of the center pixel of the box are obtained according to the bounding box of the recognized object in the image and the imaging law [8,35]. Combined with the spatial coordinates recorded by GNSS and the Eulerian angle recorded by IMU, the absolute location and attitude of the vehicle are obtained, the mapping relationship between the bounding and the observation orientation is calculated, and the LOB is constructed [30,31,36].
The projection of the pixel in the world coordinate system ( x w , y w , z w ) can be obtained through Equation (1):
[ x w y w z w ] = s R [ x c y c z c ] + [ x c a m y c a m z c a m ] ,
where s represents the depth coefficient, which will be offset in subsequent calculations; R represents the rotation matrix from the image-space coordinate system estimated from the vehicle attitude parameters to the world coordinate system; and ( x c a m , y c a m , z c a m ) represents the camera world coordinates calculated from the GNSS parameters and calibration parameters.
The orientation, b, corresponding to the pixel can be expressed by Equation (2):
b = arctan ( y w y c a m x w x c a m ) ,
LOB is represented by l, as shown in Equation (3):
l = ( x c a m , y c a m , z c a m , b ) ,

2.2. Geographic Positioning Based on Self-Adaptive Constrained LOB

After obtaining the LOB of the detected object mapping in the street-view image, association matching of the detection object in the multi-view images is realized by the spatial aggregation of the LOB. However, a large number of false associations may be generated, forming ghost nodes. To reduce the influence of ghost nodes, the grid division range is first automatically calculated according to the driving trajectory, the effective distance of the LOB is limited, and the ghost nodes are preliminarily eliminated. Then, the intersection of LOBs in each grid is expressed by a relation matrix, and ghost nodes are further eliminated based on the proposed self-adaptive constrained rules. Next, the process of the LOB-based positioning method is described in detail.

2.2.1. LOB Measurement

As shown in Figure 4, when the same object is captured by multi-view images, the bounding box of the object is mapped to an LOB, which will generate a geometric intersection, that is, the LOB intersection. However, due to the existence of observation errors, these intersections often do not overlap completely in space but are aggregated into a cluster within a certain range. The centroid of the intersections within this cluster represents the geolocation of the street object.
As shown in Figure 5, the bounding boxes detected from different images in complex scenes are not all the same object. The intersection points can be generated between any two non-parallel LOBs, which includes the real geolocation of the identified object and also contains a large number of ghost nodes. Therefore, certain constrained rules are required to eliminate ghost nodes.

2.2.2. Division of Grid

Due to the large range of the driving trajectory, each trajectory point within the range can generate several LOBs, and only the LOB intersection generated by the trajectory points with a short distance can determine the geolocation of the object. The range of the driving trajectory is divided into grids, and LOB intersection calculation is only performed in the adjacent grid each time, which indirectly restricts the effective length of the LOB, potentially reducing unnecessary calculations and removing the ghost nodes outside the effective range of the LOB.
As shown in Figure 6, a driving trajectory including n records is recorded as { T i ( x i , y i , z i ) | i = 1 , 2 , , n } , and the average baseline length of triangulations of adjacent views can be estimated by Equation (4), denoted by bl.
b l = 1 n i = 1 n 1 ( x i x i + 1 ) 2 + ( y i y i + 1 ) 2 + ( z i z i + 1 ) 2 ,
According to the two-dimensional range of the driving trajectory ( m i n x , m i n y , m a x x , m a x y ), it is evenly divided into several square grids and takes k times the distance of bl as the size of the unit grid. The number of columns, n c o l , and the number of rows, n r o w , can be obtained by Equation (5):
{ n c o l = Ceil ( m a x x m i n x k × b l ) n r o w = Ceil ( m a x y m i n y k × b l ) ,
where Ceil represents the rounding-up function.
In each calculation process, the intersection points are calculated only for the LOB mapped by the captured images within the range of Equation (6), denoted by Gridcalculation, and only the geographic positioning results within the range of Equation (7) are recorded, denoted by Gridrecorded.
Grid c a l c u l a t i o n = { x [ k × b l × ( c o l 1 ) + m i n x , k × b l × ( c o l + 2 ) + m i n x ] y [ k × b l × ( r o w 1 ) + m i n y , k × b l × ( r o w + 2 ) + m i n y ] ,
Grid r e c o r d e d = { x [ k × b l × c o l + m i n x ,   k × b l × ( c o l + 1 ) + m i n x ] y [ k × b l × r o w + m i n y , k × b l × ( r o w + 1 ) + m i n y ] ,
where col represents the column number of the current Gridrecorded, and the value is an integer within the range of [ 0 , n c o l ) ; and row represents the row number of the current Gridrecorded, which is an integer within the range of [ 0 , n r o w ) .
During grid division, the threshold, k, is used to ensure that at least two trajectory points (k > 1) for mapping LOBs are included in the grid range to be calculated. The grid is a regular square. Because only the results within the central grid are recorded each time, the maximum effective intersection distance of the LOB is k 2 2 b l . However, objects with a long distance tend to have a small number of pixels in the image and poor positioning accuracy. Therefore, the effective shooting distance, V, of the equipment can be estimated according to the equipment conditions. As shown in Equation (8), according to k 2 2 b l V , the value range of the threshold, k, is:
k ( 1 , V 2 2 b l ] ,

2.2.3. Relationship Matrix Construction

A set, L, is used to represent n LOBs, where the ith LOB is represented by li:
L = { l 1 , l 2 , l 3 , , l i , , l j , , l n } ,
Then, the intersection point is generated by the intersection of LOBs in the set, L, which can be represented by an n × n intersection matrix:
M ( P i , j ) = ( p i × j )   n × n ,
where M ( P i , j ) represents the intersection matrix of LOBs in the set, L, and p i × j represents the intersection of li and lj.
As shown in Figure 7a, Object1, Object2 and Object3 are observed at four observation points: a, b, c, and d, respectively. In order to facilitate understanding, a combination of letters and numbers is used to record the observation position and object of the LOB. For example: la1 is the LOB of Object1 observed from position a. We use three different colored LOBs to simulate the line of sight when observing three different objects. The object location is the aggregation location of the LOB intersection of the same color, and LOB intersections of different colors are ghost nodes. As shown in Figure 7b, the intersection matrix is used to describe the intersection relationship between LOBs. Because the matrix has a certain symmetry, it is only necessary to record the upper triangular matrix. Two LOBs that do not have an intersection relationship are recorded as “-”.
As shown in Figure 7a, the intersection point, p i , is traversed. If the two intersection points are close to each other (the threshold is set to t), they are classified as a cluster and recorded as ck, and the centroid of the cluster is recorded as O ( c k ) . If there are no other intersections nearby, the points are recorded as a cluster alone. In Figure 7b, the matrix elements in the same cluster are marked with the same color, and each cluster can be expressed as:
c k = { p i | dist ( p i , O ( c k ) ) < t } ,
where dist represents the function to calculate the distance between two points.

2.2.4. Elimination of Ghost Nodes Based on Constrained Rules

In this study, we introduce two constrained rules without setting dynamic thresholds. By recursively executing the constrained rules until the number of clusters no longer changes, the effective elimination of ghost nodes is achieved as far as possible, which reduces the impact of ghost nodes on the positioning results.
1.
Constrained rules based on the minimum number of intersections in the cluster
When the number of LOBs of the observed object is greater than 2, the number of intersections in the cluster should be greater than 1 [27,28,30,31]. The number of intersections contained in each cluster is counted. If there is only one intersection, the intersection contained in the cluster is determined as a ghost node. As shown in Figure 8a, all clusters (marked with “ד) that are determined to be ghost nodes are deleted, and the intersections within the cluster with LOBs in the intersection matrix are disassociated, as shown in Figure 8b. At this point, most of the ghost nodes have been eliminated, and only the clusters (marked with “?”), the candidate points of which are to be further judged, are retained.
2.
Constrained rules based on the uniqueness of LOB association
Each LOB is a line-of-sight simulation of the observation object. If the LOB is only associated with one intersection within a cluster, the cluster must be the object location. The LOB associated with this cluster should be disassociated from other clusters to ensure the uniqueness of the LOB association.
For example, as shown in Figure 9, la1 has an intersecting relationship only with the intersection points in one cluster, so the cluster must be the object location, that is, Object1. Other LOBs associated with Object1 should also be uniquely associated with Object1 only. In addition to being associated with Object1, ld1 is also associated with other clusters. Therefore, it is necessary to disassociate the association between ld1 and the intersections in other clusters. At this time, the number of intersections in the red cluster is 1, which will be eliminated as a ghost node in the next iteration.

3. Experimental Results and Discussions

3.1. Data Collection and Selection of Research Area

In this study, we used the data collected by the Alpha 3D vehicle-mounted laser-scanning measurement system produced by CHC NAVIGATION for method validation. The system is equipped with a Ladybug panoramic camera, GNSS, IMU, and LIDAR. The original data obtained by the system are a series of binary stream data that cannot be directly used by users, and a series of preprocessing operations are required. Through CoPre software developed by CHC NAVIGATION, the image stream data output by the Ladybug panoramic camera are read and spliced to form a panoramic image with a 360° viewing angle stored as a general picture format with 8192 × 4096-pixel resolution. Through Inertial Explorer software, the IMU and GNSS data are jointly processed to obtain the high-precision driving trajectory (latitude and longitude coordinates, regional projection coordinates, and elevation), speed, attitude (roll, pitch, and heading), and other information in the specified coordinate system, which are output one-by-one to form a structured, readable text. The output trajectory data have a horizontal accuracy of 0.010 m and a vertical accuracy of 0.020 m. For the acquired attitude data, the roll/pitch accuracy is 0.005° and the heading is 0.017° [37]. The high-resolution imagery provides geometric texture and semantic information of street-side objects for the experiments. High-precision position and attitude data provide sufficient measurements for precision support for method verification.
As shown in Figure 10, an Alpha 3D vehicle-mounted laser-scanning measurement system was used to collect data from the Yanziji and Mufushan areas of Nanjing City, Jiangsu Province, China, and the Yincun Town area of Changzhou City, Jiangsu Province, China. Nanjing City and Changzhou City are located in the same province in China, and Yincun area and Mufushan and Yanziji area are only 120 km away, with similar street-layout styles.

3.2. Object Detection and LOB Mapping

In this study, 6367 street-view images collected from the Yanziji area and 6920 street-view images collected from the Mufushan area with a resolution of 8192 × 4096 pixels were used as annotation data. Three classifications of pole-like objects—utility poles, street lamps, and signboards (which were widely and largely distributed)—were used as acquisition objects, and a workstation equipped with an Intel Xeon E5-2698 V4 CPU and a Tesla V100 GPU was used for training.
As shown in Figure 11, because most of the labeled data are distributed in the vertical center area of the image and the image distortion in this area is relatively small, for the convenience of training, the original image was cut into 53,148 2048 × 2048 subimages that only contain the middle area for object labeling. These pole-like objects were divided into rod parts and top parts for labeling, avoiding overlapping of bounding boxes as much as possible. The labeling results include: 48,162 rod parts, 7435 top parts of utility poles, 26,751 top parts of street lamps, and 5695 top parts of signboards. Taking this as the sample data, the samples were randomly divided into a training set and test set according to a 7:3 ratio and put into the Cascade R-CNN classifier [34] for training. The average precision of training was 0.880, and the recall rate was 0.929 (IoU > 0.5).
A total of 4892 street-view images collected in Yincun Town were used for object detection. Because the location of the rod part is used as the object geolocation during positioning, in the detection results, the rod parts, top parts of utility poles, top parts of street lamps, and top parts of signboards were detected separately. The classifications of pole-like objects to which the rod-parts belong is given by the classifications of the closest top parts. This method relies on the orientation of the rod parts to map LOBs, which provides more accurate orientation parameters for the subsequent geographic positioning of the pole-like objects. In order to reduce the impact of recognition errors on the subsequent matching process, the classification results were manually checked, and a total of 6104 LOBs of pole-likes objects were mapped, including 3325 utility poles, 1814 streetlamps, and 965 signboards.

3.3. Geographic Positioning Based on Self-Adaptive Constrained LOB

The region division and geographic positioning algorithms are programmed in C# language and run under the Windows 10 operating system using a personal computer with an Intel Core i7-7700 CPU and 8GB RAM for calculation.
The calculation results of the proposed method were compared with the 1409 pole-like objects collected manually in this area, and the calculation results were evaluated from three indicators: time consumption, recall rate, and precision rate. The closer the recall rate and precision rate are to 1, the better the positioning effect of the algorithm is. The closer the time consumption is to 0, the more efficient the algorithm is. The time consumption is based on the actual running time (running time is the calculation time of LOB-based geographic positioning). The recall rate and precision rate were calculated using Equation (12):
Recall   rate = N r c N r f Precision   rate = N c o N c a l ,
where N r f represents the number of reference points, N c a l represents the number of calculated results, N r c represents the number of reference points within a 1 m buffer range of all calculation results, and N c o represents the number of calculation results within 1 m buffer range of the reference point.
The LOB-based positioning method proposed in this study is affected by two thresholds: the enlargement coefficient, k, for dividing the grid and the distance, t, for aggregating the cluster points. In order to study the influence of the threshold parameter on the method, grid division is carried out according to the threshold parameter, and the LOB intersection and ghost-node elimination are performed grid-by-grid and classification-by-classification. According to the 4892 driving trajectories corresponding to the street-view images, the average baseline length of triangulation of adjacent views is calculated to be 7.21 m. The device can effectively capture pole-like objects within 100 m. The effective shooting distance, V, is set to 100 m. According to Equation (8), the value range of k is calculated as (1, 4.9]. For the convenience of calculation, an integer near this range is taken as the value of the threshold, k. Taking into account the error of recording parameters during acquisition, the distortion of street-view images, and the influence of the curvature of the earth on geodetic triangulation, the clustering distance, t, of the cluster is set to range from 0.1 m to 1 m, and the threshold value, t, is taken every 0.1 m. Table 1 shows the evaluation results of the geographic positioning method based on self-adaptive constrained LOB with different threshold combinations.
Within the estimated 100 m maximum effective range of the device, k takes 4 to make the maximum line-of-sight range 81.6 m, and the value of t is set to 0.2 m according to the output accuracy of the algorithm of the device. The calculation results obtained by this combination of thresholds are shown in Figure 12. The results prove the effectiveness of the proposed method for the positioning of large-scale pole-like objects. The positioning result has high accuracy and is associated with the corresponding object images, which can be imported into the database as the final result.

3.4. Comparative Analysis and Discussions with Existing Methods

In order to achieve a comparative analysis between the method proposed in this study and existing methods, in this paper, we reproduced the modified brute-force-based LOB algorithm proposed by Zhang [30,31]. The algorithm is affected by three thresholds: the number of views, the angle, and the distance to the center of the selected road. The threshold is set according to the parameters provided in the paper and the actual situation of the data presented in this paper. The evaluation results are as follows.
Based on the modified brute-force-based LOB algorithm, there is some uncertainty in the selection of the threshold value. It is necessary to make multiple attempts to select the empirical value in combination with the data; especially when the road width is unknown or the road width varies greatly in a large-scale area, the threshold of the distance to the center of the selected road is often difficult to determine. It is easy to see from Table 2 that the modified brute-force-based LOB algorithm relies on expanding the threshold range to increase the number of candidate points, which takes more time. Although this can slightly improve the recall rate, it often leads to a decline in the accuracy of the recall results.
Geographic positioning based on the self-adaptive constrained LOB proposed in this study can automatically calculate the range of effective threshold k according to the driving trajectory, and the threshold t is a fixed value under the condition of unchanged equipment. The values of k within the effective shooting range achieved a good recall rate and accuracy rate, with a short calculation time. As the value of k increases, the range of the unit grid becomes larger, the calculation time also gradually increases, and the number of ghost nodes generated when the effective distance of LOB exceeds the actual distance also increases. The recall rate and the accuracy rate both decreased, but they still maintained a high level. Due to the influence of the acquisition equipment, the calculation results often cannot achieve high accuracy and produce offsets. If the cluster aggregation distance, t, is too small, the intersection points near the object location cannot be aggregated into a cluster, which would be used as the result of repeated acquisition. If the value of t is large, it causes adjacent objects of the same type to merge into a cluster, resulting in missing positioning and lowering the recall rate. Because the distance between street lamps and utility poles is large, a high value of t has little impact on them, although it has a considerable impact on signboards that are close to each other.
Compared with other LOB-based positioning methods with which it is difficult to select thresholds, the method proposed in this study is adaptable in threshold selection. The appropriate k value range can be automatically calculated through the driving trajectory, and the t value can take a fixed value according to the output accuracy of the equipment. Compared with the modified brute-force-based LOB algorithm, the proposed method limits the effective range of the LOB by dividing the grid and does not need to rely on thresholds, such as the number of adjacent view points and distance to the center of the selected road, which are affected by changes in road scenes.

4. Conclusions

In this study, we proposed a method for automatic positioning of objects in street-view images based on MMS. Aiming to reduce the difficulty of image feature matching due to a long baseline in street view, a geographic positioning method based on self-adaptive constrained LOB is implemented by referring to the object-matching algorithm based on a combination of object detection and LOB positioning. In order to overcome the time consumption and the difficulties of threshold selection caused by an LOB-based positioning algorithm, the idea of “divide–conquer” is introduced, and the calculation area is divided into grids according to the driving trajectory. The calculations in each grid are independent and do not interfere with each other, which greatly improves computing efficiency. In order to make the algorithm universal, a ghost node elimination algorithm based on self-adaptive constrained rules is proposed according to the line-of-sight rule when observing the object, which realizes the non-image feature matching of the same object in multi-view images.
Taking signboards, utility poles, and street lamps of multiple road sections in Yincun Town, Changzhou City, Jiangsu Province, as the experimental objects, experiments were carried out using multiple thresholds and compared with previous LOB-based object-positioning methods. The results show that the proposed method has higher efficiency and accuracy than previous methods, and the threshold selection range is clear and easy to promote. This method can perform automatic and accurate geographic positioning and image acquisition for a large range of street objects based on high-precision MMS, verifying its feasibility.
The results of this study are applicable in the acquisition of geolocation information for street objects, which can be used to draw high-precision maps required for autonomous driving and to provide data support for autonomous driving positioning, path planning, and traffic warning. Geolocation information on street-side objects can also assist in road safety detection and can help government departments to better manage and maintain urban living facilities and transportation facilities.
Street-side objects are easily blocked by vehicles, resulting in missing detection of targets. When the number of detections of the same object in multi-view images is less than three, the LOB-based positioning method cannot perform effective target positioning, and repeat acquisition is required for the road section in question. The data collection and positioning methods presented in this paper are not synchronized. With the support of software and hardware, the data stream of an MMS system can be converted into image and driving-trajectory data in real time. If the object-detection model proposed in this study is replaced with a lightweight model with higher detection efficiency, combined with the trajectory information obtained at a short distance, it would be possible to realize real-time image-based street-object positioning. This suggests the possibility of real-time online updating and sharing of high-precision maps in the future.

Author Contributions

Conceptualization, Guannan Li; Formal analysis, Guannan Li; Funding acquisition, Liangchen Zhou and Guonian Lv; Investigation, Guannan Li; Methodology, Guannan Li and Liangchen Zhou; Software, Guannan Li; Supervision, Bingxian Lin, Liangchen Zhou and Guonian Lv; Visualization, Guannan Li; Writing—original draft, Guannan Li; Writing—review & editing, Xiu Lu. All authors have read and agreed to the published version of the manuscript.

Funding

This research study was funded by the National Natural Science Foundation of China: 42076203.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Escalera, A.D.L.; Armingol, J.M.; Mata, M. Traffic sign recognition and analysis for intelligent vehicles. Image Vis. Comput. 2003, 21, 247–258. [Google Scholar] [CrossRef] [Green Version]
  2. Guo, C.Z.; Kidono, K.; Meguro, J.; Kojima, Y.; Ogawa, M.; Naito, T. A low-cost solution for automatic lane-level map generation using conventional in-car sensors. IEEE Trans. Intell. Transp. Syst. 2016, 17, 2355–2366. [Google Scholar] [CrossRef]
  3. Farhadmanesh, M.; Cross, C.; Mashhadi, A.H.; Rashidi, A.; Wempen, J. Highway Asset and Pavement Condition Management using Mobile Photogrammetry. Transp. Res. Record. 2021, 2675, 296–307. [Google Scholar] [CrossRef]
  4. Cafiso, S.; Di Graziano, A.; Pappalardo, G. Safety Inspection and Management Tools for Low-Volume Road Network. Transp. Res. Record. 2015, 2472, 134–141. [Google Scholar] [CrossRef]
  5. Zhao, X.; Ding, Y.; Yao, Y.; Zhang, Y.; Bi, C.; Su, Y. A multinomial logit model: Safety risk analysis of interchange area based on aggregate driving behavior data. J. Saf. Res. 2022, 80, 27–38. [Google Scholar] [CrossRef] [PubMed]
  6. Kim, J.G.; Han, D.Y.; Yu, K.Y.; Kim, Y.; Rhee, S.M. Efficient extraction of road information for car navigation applications using road pavement markings obtained from aerial images. Can. J. Civ. Eng. 2006, 33, 1320–1331. [Google Scholar] [CrossRef]
  7. Abdalla, M.; Easa, S.M. Extracting streetlight poles from orthophotos: Methodology and case study in Ontario, Canada. J. Surv. Eng. 2007, 133, 184–187. [Google Scholar] [CrossRef]
  8. Kim, G.H.; Sohn, H.G.; Song, Y.S. Road infrastructure data acquisition using a vehicle-based mobile mapping system. Comput. Aided Civ. Infrastruct. Eng. 2010, 21, 346–356. [Google Scholar] [CrossRef]
  9. Puente, I.; González-Jorge, H.; Martínez-Sánchez, J.; Arias, P. Review of mobile mapping and surveying technologies. Measurement 2013, 46, 2127–2145. [Google Scholar] [CrossRef]
  10. Wegner, J.D.; Branson, S.; Hall, D.; Schindler, K.; Perona, P. Cataloging public objects using aerial and street-level images—Urban trees. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
  11. Liu, L.; Tang, X.; Xie, J.; Gao, X.; Zhao, W.; Mo, F.; Zhang, G. Deep-learning and depth-map based approach for detection and 3D localization of small traffic signs. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2020, 13, 2096–2111. [Google Scholar] [CrossRef]
  12. Dong, G.; Yan, Y.; Shen, C.; Wang, H. Real-time high-performance semantic image segmentation of urban street scenes. IEEE Trans. Intell. Transp. Syst. 2021, 22, 3258–3274. [Google Scholar] [CrossRef] [Green Version]
  13. Gao, Y.; Xiao, G. Real-time chinese traffic warning signs recognition based on cascade and CNN. J. Real-Time Image Process. 2020, 18, 669–680. [Google Scholar] [CrossRef]
  14. Timofte, R.; Van Gool, L. Multi-view manhole detection, recognition, and 3D localisation. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, 6–13 November 2011. [Google Scholar]
  15. Bayoudh, K.; Hamdaoui, F.; Mtibaa, A. Transfer learning based hybrid 2D-3D CNN for traffic sign recognition and semantic road detection applied in advanced driver assistance systems. Appl. Intell. 2021, 51, 124–142. [Google Scholar] [CrossRef]
  16. Tsai, V.J.D.; Chang, C.-T. Feature position on Google street view panoramas. ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci. 2012, 4, 305–309. [Google Scholar] [CrossRef] [Green Version]
  17. Nassar, A.S.; Lang, N.; Lefèvre, S.; Wegner, J.D. Learning geometric soft constraints for multi-view instance matching across street-level panoramas. In Proceedings of the 2019 Joint Urban Remote Sensing Event (JURSE), Vannes, France, 22–24 May 2019. [Google Scholar]
  18. Ogawa, M.; Aizawa, K. Identification of Buildings In Street Images Using Map Information. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, China, 22–25 September 2019. [Google Scholar]
  19. Zhu, S.; Yang, T.; Chen, C. Revisiting Street-to-Aerial View Image Geo-localization and Orientation Estimation. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2021. [Google Scholar]
  20. Nassar, A.S.; D’Aronco, S.; Lefèvre, S.; Wegner, J.D. GeoGraph: Graph-based multi-view object detection with geometric cues End-to-End. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020. [Google Scholar]
  21. Krylov, V.A.; Kenny, E.; Dahyot, R. Automatic discovery and geotagging of objects from street view imagery. Remote Sens. 2018, 10, 661–671. [Google Scholar] [CrossRef] [Green Version]
  22. Hebbalaguppe, R.; Garg, G.; Hassan, E.; Ghosh, H.; Verma, A. Telecom inventory management via object recognition and localisation on Google street view images. In Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA, 24–31 March 2017. [Google Scholar]
  23. Pattipati, K.R.; Deb, S.; Bar-Shalom, Y.; Washburn, R.B. A new relaxation algorithm and passive sensor data association. IEEE Trans. Autom. Control 1992, 37, 198–213. [Google Scholar] [CrossRef]
  24. Bishop, A.N.; Pathirana, P.N. Localization of emitters via the intersection of bearing lines: A ghost elimination approach. IEEE Trans. Veh. Technol. 2007, 56, 3106–3110. [Google Scholar] [CrossRef]
  25. Reed, J.D.; Silva, C.R.C.M.D.; Buehrer, R.M. Multiple-source localization using line-of-bearing measurements: Approaches to the data association problem. In Proceedings of the MILCOM 2008—2008 IEEE Military Communications Conference, San Diego, CA, USA, 16–19 November 2008. [Google Scholar]
  26. Chu, L.; Chen, W. Correction of Mobile Positioning and Direction via CNNs Based on Street View Images. In Proceedings of the 2018 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW), Taichung, Taiwan, China, 19–21 May 2018. [Google Scholar]
  27. Hazelhoff, L.; Creusen, I.; de With, P.H. System for semi-automated surveying of street-lighting poles from street-level panoramic images. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Steamboat Springs, CO, USA, 24–26 March 2014. [Google Scholar]
  28. Hazelhoff, L.; Creusen, I.; de With, P.H. Robust detection, classification and positioning of traffic signs from street-level panoramic images for inventory purposes. In Proceedings of the 2012 IEEE Workshop on the Applications of Computer Vision (WACV), Breckenridge, CO, USA, 9–11 January 2012. [Google Scholar]
  29. Lumnitz, S.; Devisscher, T.; Mayaud, J.R.; Radic, V.; Coops, N.C.; Griess, V.C. Mapping trees along urban street networks with deep learning and street-level imagery. ISPRS J. Photogramm. Remote Sens. 2021, 175, 144–157. [Google Scholar] [CrossRef]
  30. Zhang, W.; Witharana, C.; Li, W.; Zhang, C.; Li, X.; Parent, J. Using deep learning to identify utility poles with crossarms and estimate their locations from google street view images. Sensors 2018, 18, 2484. [Google Scholar] [CrossRef] [Green Version]
  31. Khan, A.; Asim, W.; Ulhaq, A.; Ghazi, B.; Robinson, R.W. Health assessment of eucalyptus trees using siamese network from Google street and ground truth images. Remote Sens. 2021, 13, 2194. [Google Scholar] [CrossRef]
  32. Soheilian, B.; Paparoditis, N.; Vallet, B. Detection and 3D reconstruction of traffic signs from multiple view color images. ISPRS-J. Photogramm. Remote Sens. 2013, 77, 1–20. [Google Scholar] [CrossRef]
  33. Timofte, R.; Zimmermann, K.; Van Gool, L. Multi-view traffic sign detection, recognition, and 3D localisation. Mach. Vis. Appl. 2014, 25, 633–647. [Google Scholar] [CrossRef]
  34. Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into high quality object detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
  35. Bruno, N.; Roncella, R. Accuracy Assessment of 3d Models Generated from Google Street View Imagery. ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, XLII-2/W9, 181–188. [Google Scholar] [CrossRef] [Green Version]
  36. Tsai, V.J.D.; Chang, C.T. Three-dimensional positioning from Google street view panoramas. IET Image Process. 2013, 7, 229–239. [Google Scholar] [CrossRef] [Green Version]
  37. Alpha 3D Vehicle-mounted Laser Scanning Measurement System. Available online: http://www.huace.cn/product/product_show1/431 (accessed on 19 February 2022).
Figure 1. Flow chart of this study.
Figure 1. Flow chart of this study.
Ijgi 11 00253 g001
Figure 2. Cascade R-CNN schematic: “Input” represents input image, “Conv” represents backbone convolutions, “RPN” represents region proposal network, “Pool” represents region-wise feature extraction, “H” represents network head, “C” represents classification, and “B” represents bounding box.
Figure 2. Cascade R-CNN schematic: “Input” represents input image, “Conv” represents backbone convolutions, “RPN” represents region proposal network, “Pool” represents region-wise feature extraction, “H” represents network head, “C” represents classification, and “B” represents bounding box.
Ijgi 11 00253 g002
Figure 3. Acquisition of bounding box and LOB: (a) bounding box in street-view image and (b) LOB in the world coordinate.
Figure 3. Acquisition of bounding box and LOB: (a) bounding box in street-view image and (b) LOB in the world coordinate.
Ijgi 11 00253 g003
Figure 4. Schematic diagram of bounding box, LOB, and cluster: letters (a) to (e) represent different views.
Figure 4. Schematic diagram of bounding box, LOB, and cluster: letters (a) to (e) represent different views.
Ijgi 11 00253 g004
Figure 5. Schematic diagram showing reduction of the influence of ghost nodes on the positioning results based on constrained rules: letters a to d represent different views.
Figure 5. Schematic diagram showing reduction of the influence of ghost nodes on the positioning results based on constrained rules: letters a to d represent different views.
Ijgi 11 00253 g005
Figure 6. Schematic diagram of grid division and grid-by-grid calculation.
Figure 6. Schematic diagram of grid division and grid-by-grid calculation.
Ijgi 11 00253 g006
Figure 7. Intersection and cluster based on LOBs. (a) Schematic diagram of LOB and cluster point (b). Intersection relationship matrix of LOB: the association relationship between LOB and intersection and the inclusion relationship between cluster and intersection.
Figure 7. Intersection and cluster based on LOBs. (a) Schematic diagram of LOB and cluster point (b). Intersection relationship matrix of LOB: the association relationship between LOB and intersection and the inclusion relationship between cluster and intersection.
Ijgi 11 00253 g007
Figure 8. Elimination of ghost nodes based on the rule of minimum number of intersections in the cluster: (a) deletion of clusters that do not satisfy this rule; (b) disassociation of relationships that do not satisfy this rule.
Figure 8. Elimination of ghost nodes based on the rule of minimum number of intersections in the cluster: (a) deletion of clusters that do not satisfy this rule; (b) disassociation of relationships that do not satisfy this rule.
Ijgi 11 00253 g008
Figure 9. Elimination of ghost nodes based on the uniqueness of LOB association: (a) clusters filtered by this rule; (b) associations filtered by this rule.
Figure 9. Elimination of ghost nodes based on the uniqueness of LOB association: (a) clusters filtered by this rule; (b) associations filtered by this rule.
Ijgi 11 00253 g009
Figure 10. Schematic diagram of data collection and research area: (a) administrative border region of Jiangsu Province; (b) driving trajectory in Yincun area; (c) driving trajectory in Mufu Mountain and Yanziji area.
Figure 10. Schematic diagram of data collection and research area: (a) administrative border region of Jiangsu Province; (b) driving trajectory in Yincun area; (c) driving trajectory in Mufu Mountain and Yanziji area.
Ijgi 11 00253 g010
Figure 11. Schematic diagram of cutting and labeling of street-view image: letters (a) to (d) indicate that the street-view image is cut into four subimages.
Figure 11. Schematic diagram of cutting and labeling of street-view image: letters (a) to (d) indicate that the street-view image is cut into four subimages.
Ijgi 11 00253 g011
Figure 12. Calculation results: (a) distribution of calculation results of pole-like objects in the study area, (b) scaling of positioning results in the local area, (c) superposition of positioning results and high-resolution remote-sensing images, and (d) object images corresponding to positioning results.
Figure 12. Calculation results: (a) distribution of calculation results of pole-like objects in the study area, (b) scaling of positioning results in the local area, (c) superposition of positioning results and high-resolution remote-sensing images, and (d) object images corresponding to positioning results.
Ijgi 11 00253 g012
Table 1. Evaluation results of the geographic positioning method based on self-adaptive constrained LOB.
Table 1. Evaluation results of the geographic positioning method based on self-adaptive constrained LOB.
k-Effective Viewing
Distance (m)
Threshold of Cluster
Distance (m)
Time Consumption (s) Estimated Number of PolesRecall RatePrecision Rate
2–40.810.16.3615090.880.97
0.25.6613930.920.97
0.35.1813740.920.97
0.45.1413570.910.97
0.54.9913440.90.97
0.65.0513420.90.96
0.75.5813210.890.96
0.85.2513280.890.96
0.95.6513190.880.96
15.3613160.880.95
3–61.220.15.7916080.920.97
0.25.5714200.930.97
0.36.0814000.930.97
0.45.3113880.930.96
0.55.2213670.920.96
0.65.1813690.910.96
0.75.1213430.90.96
0.85.0913500.90.95
0.95.4713460.890.95
15.1913370.890.95
4–81.630.18.5116170.920.96
0.28.0314220.930.96
0.38.0514010.930.96
0.47.5013930.930.96
0.57.5513640.910.96
0.67.3913730.910.95
0.77.1913560.90.95
0.87.3813650.90.94
0.96.9513600.890.94
16.7613460.890.94
5–102.040.113.0116180.910.96
0.211.8614240.930.96
0.311.4314020.930.96
0.411.2813840.920.96
0.511.1813600.910.96
0.610.7413550.90.96
0.710.4413450.90.95
0.810.0413560.890.94
0.99.9513540.880.93
19.7413590.880.92
Table 2. Evaluation results of the geographic positioning method based on modified brute-force-based LOB.
Table 2. Evaluation results of the geographic positioning method based on modified brute-force-based LOB.
Number of ViewsThreshold of Angle (°)Threshold of Distance to Center of Selected Road (m) Time Consumption (s) Estimated Number of PolesRecall RatePrecision Rate
31105.2013180.830.9
155.2014420.90.89
205.2814650.910.88
2105.8214520.850.84
155.7916040.920.83
205.5716370.920.81
3105.6815790.850.78
155.8517600.920.76
207.1918080.930.75
41106.6513580.830.87
157.2714980.90.86
207.1515380.910.84
2106.8215220.850.81
157.8416990.920.78
207.3417550.930.76
3107.3316960.850.74
158.3619170.930.71
208.6119940.930.69
51106.8613870.840.86
158.0815470.910.84
208.0315930.910.82
2108.2715750.850.79
159.7817880.920.75
209.6218620.930.72
3109.3017730.860.71
1510.8320510.930.67
2011.5321610.930.64
61108.3514160.840.85
159.5615900.910.82
209.5016510.910.79
2109.8716250.850.76
1511.1618720.920.72
2011.6919810.930.68
31011.0318440.860.69
1513.7821820.930.63
2014.2923420.940.59
71108.9414470.840.83
1510.3316490.910.79
2011.4517380.910.75
21011.5916700.850.74
1514.1319640.920.68
2015.0121310.930.63
31013.8919140.860.66
1517.9523210.930.59
2020.0425640.940.54
811010.7014670.840.82
1512.4117090.910.76
2013.4618380.910.71
21013.1417040.850.73
1517.1120640.920.65
2018.5422900.930.59
31015.8619660.860.64
1522.7824650.930.56
2024.2928040.940.49
911011.6514870.840.81
1514.5517610.910.74
2016.0919190.910.68
21015.0517360.850.71
1520.3721500.920.62
2022.9924220.930.56
31019.0820160.860.63
1526.9426080.930.53
2031.6530260.940.46
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Li, G.; Lu, X.; Lin, B.; Zhou, L.; Lv, G. Automatic Positioning of Street Objects Based on Self-Adaptive Constrained Line of Bearing from Street-View Images. ISPRS Int. J. Geo-Inf. 2022, 11, 253. https://doi.org/10.3390/ijgi11040253

AMA Style

Li G, Lu X, Lin B, Zhou L, Lv G. Automatic Positioning of Street Objects Based on Self-Adaptive Constrained Line of Bearing from Street-View Images. ISPRS International Journal of Geo-Information. 2022; 11(4):253. https://doi.org/10.3390/ijgi11040253

Chicago/Turabian Style

Li, Guannan, Xiu Lu, Bingxian Lin, Liangchen Zhou, and Guonian Lv. 2022. "Automatic Positioning of Street Objects Based on Self-Adaptive Constrained Line of Bearing from Street-View Images" ISPRS International Journal of Geo-Information 11, no. 4: 253. https://doi.org/10.3390/ijgi11040253

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop