An Efficient Method for Detecting Asphalt Pavement Cracks and Sealed Cracks Based on a Deep Data-Driven Model

Yang, Nan; Li, Yongshang; Ma, Ronggui

doi:10.3390/app121910089

Open AccessArticle

An Efficient Method for Detecting Asphalt Pavement Cracks and Sealed Cracks Based on a Deep Data-Driven Model

by

Nan Yang

,

Yongshang Li

and

Ronggui Ma

^*

School of Information Engineering, Chang’an University, Xi’an 710064, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(19), 10089; https://doi.org/10.3390/app121910089

Submission received: 21 August 2022 / Revised: 3 October 2022 / Accepted: 4 October 2022 / Published: 7 October 2022

(This article belongs to the Special Issue Damage Monitoring and Defect Identification Based on Deep/Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

:

Thanks to the development of deep learning, the use of data-driven methods to detect pavement distresses has become an active research field. This research makes four contributions to address the problem of efficiently detecting cracks and sealed cracks in asphalt pavements. First, a dataset of pavement cracks and sealed cracks is created, which consists of 10,400 images obtained by a vehicle equipped with a highway condition monitor, with 202,840 labeled distress instances included in these pavement images. Second, we develop a dense and redundant crack annotation method based on the characteristics of the crack images. Compared with traditional annotation, the method we propose generates more object instances, and the localization is more accurate. Next, to achieve efficient crack detection, a semi-automatic crack annotation method is proposed, which reduces the working time by 80% compared with fully manual annotation. Finally, comparative experiments are conducted on our dataset using 13 currently prevailing object detection algorithms. The results show that dense and redundant annotation is effective; moreover, cracks and sealed cracks can be efficiently and accurately detected using the YOLOv5 series model and YOLOv5s is the most balanced model with an F1-score of 86.79% and an inference time of 14.8ms. The pavement crack and sealed crack dataset created in this study is publicly available.

Keywords:

asphalt pavement; crack detection; semi-automatic crack annotation; data-driven model; deep learning

1. Introduction

As urbanization increases, highway networks have been developed to meet challenges and demands in transportation. As highways comprise part of the critical infrastructure for public transportation, promoting advanced analysis and assessment technology for highway systems is an important part of an intelligent transportation system (ITS). In the past decade, an increasing number of highways have been damaged due to the poor environment, car overload, material aging, and so on [1]. Pavement distress detection, identification and classification are important steps in a pavement management system (PMS). It helps the agency to determine the appropriate rehabilitation techniques to be performed on the pavement [2]. Generally, cracks are the earliest signs of pavement distress. The continuous propagation of cracks without proper treatment in the early stages of damage will result in high maintenance costs and severe consequences. Therefore, detecting cracks and repairing them quickly are essential tasks for highway maintenance departments. Normally, highway maintenance workers use sealants to repair cracks. However, when the pavement structure has been damaged, cracks are often more likely to develop around the sealed crack, so it is also necessary to detect sealed cracks. The traditional routine of highway pavement distress inspection relies on manual on-site surveys, which are labor-intensive and time-consuming. In addition, highways are dangerous working environments for road inspection personnel [3]. It is therefore necessary to develop automatic and efficient methods of detecting highway pavement cracks and sealed cracks.

At present, special road condition inspection devices have been extensively studied. Non-contact devices such as thermal imaging sensors [4] and ground-penetrating radar [5] have been utilized to detect cracks and take advantage of the differences in the signals returned from normal and damaged pavements. Embedded fiberoptic sensors [6] are also an emerging technology used to detect pavement distress. Although the tools mentioned above can accurately locate pavement distress, their high cost and low efficiency are critical drawbacks in real-world scenarios, preventing them from being applied to a wider range of situations. Image acquisition methods based on charge-coupled devices (CCDs) and complementary metal oxide semiconductor (CMOS) sensors are prevalent because of their efficiency and low cost. However, the detection of pavement distress using image processing is still a very challenging task, especially for asphalt pavements. The reasons for this can be summarized as follows:

The irregularity of the pavement surface [3];
Cracks are characterized by low-level features that can be easily confused with the background texture and other objects on the pavement [7];
The influence of external conditions such as illumination and camera jitter on the image quality.

Image processing methods such as transformation, enhancement, and segmentation have gained the attention of researchers in the field of pavement assessment, but these methods are empirical [3], i.e., they require constant adjustment of the parameters to achieve an optimum result. Traditional machine learning methods, such as random forest [8,9] and AdaBoost [10], have also been used to detect cracks in pavements. The problem is that such methods can only obtain low-level image information and cannot extract high-level semantic information, which has a significant impact on the robustness of the algorithms. With the increase of parallel computing power and the development of deep learning, data-driven methods are widely used in PMS in various countries [11,12]. Convolutional neural network (CNN) [13] is one of the well-known data-driven methods that can automatically learn high-level information from large amounts of data through a multilayered artificial neural network (ANN), which can be used to classify images or detect objects.

Nevertheless, challenges still exist for data-driven-based pavement crack detection:

Data are the basis of deep learning algorithms, but there are not enough publicly available pavement datasets, and even models trained on publicly available datasets with only a few hundred images are not guaranteed to be effective.
Building datasets is a time- and resource-consuming task.
There is no efficient and practical detection method for pavement cracks and sealed cracks.

The crack detection system proposed in this study was designed to address these problems.

In a real-world scenario, the low practicality and inefficiency of pavement inspection systems are difficult problems for road maintenance departments. Research institutions and companies in many countries have developed automated road condition monitoring vehicles that automatically detect road damage at normal traffic flow speeds while a large number of road images are collected using CCD or CMOS sensors mounted on the rear of the vehicle. The use of an efficient and practical system to process this enormous amount of data and detect cracks and sealed cracks within them was the focus of this study. The contributions of this study are as follows:

A publicly available dataset of pavement cracks and sealed cracks, collected by an automated road condition monitoring vehicle, is created and labeled;
For the features of highway asphalt pavement images, a dense and redundant crack annotation method is proposed, which provides more object instances and more accurate object positioning than traditional big-block annotation;
In order to quickly implement the inspection system and reduce labor costs, a semi-automatic crack annotation method is proposed, which reduces the creation time of the dataset by 80% compared with fully manual annotation;
In our dataset, 13 currently popular object detection models are compared, and the YOLOv5 family of models are considered to be efficient and accurate. The results prove that our proposed annotation method is effective.

The paper’s structure comprises five sections, complemented by the current Introduction. Section 2 presents related work in the field of crack detection in recent years. Details of our newly developed dataset are presented in Section 3. Section 4 describes the specific experimental settings. Section 5 illustrates the results, including a detailed comparison of different models and a discussion. Section 6 concludes the study.

2. Related Work

In this section, we present various studies that were proposed by different researchers for pavement distress detection. As shown in Table 1, we focus on the pavement distress dataset, image annotation type, and detection model.

2.1. Image Dataset of Pavement Distress

Having sufficient data is the cornerstone of using deep learning to detect pavement distress. There have been several attempts by researchers to build pavement image datasets through a variety of image acquisition methods. The pavement damage dataset built by Maeda et al. [17] using a smartphone contains 9053 images with eight different damage types. Here, the smartphone was mounted on top of a vehicle dashboard to capture the road ahead, so the quality of the images was affected by vehicle vibration; moreover, since the images were captured approximately 10 m ahead, only the easily distinguishable road damage, such as potholes, coarse cracks, and blurred white lines, can be found in the images. The German asphalt pavement distress (GAPs) dataset [14] was acquired using a high-resolution camera suspended behind a high-speed measurement vehicle and contains 2468 grayscale images with six damage types, and the original images were segmented into sub-images with a resolution of 64 × 64 pixels for classification. The data in the GAPs dataset come from three different federal roads in Germany, with the data from two of the roads serving as the training and validation sets, and the data from the other road serving as the test set only. The GAPs classification results show that when there is a difference between the training and the test set, the performance of the ANN on the test set deteriorates significantly, even if the ANN has achieved good results on the training set. The dataset created by Maniat et al. [15] is from Google Street View and the main drawbacks are the presence of many artifacts and the low quality of the images, but using Google Street View is a very economical way to obtain pavement data. Wu et al. [21] used a camera to capture 95,000 images for segmenting cracks, and the method they used is based on a data-driven full convolutional network (FCN), which means that in the training set, each object needs to be accurately labeled manually using polygons, which can be labor- and resource-intensive. It is also a challenge to apply this rapidly in practice, and the data are not publicly available.

Many researchers do not have enough data. Amhaz et al. [26], Oliveira et al. [27], and Shi et al. [8] used a segmentation-based method to detect cracks in pavements, but the number of images in the datasets they used was less than 500, which is not enough to train an ANN to discriminate cracks. Wang et al. [16] used a dataset containing only 20 images to train a crack detector, increasing the diversity of the data by means of data augmentation. Since, in practice, there are not enough pothole images to train an ANN, Maeda et al. [20] proposed the use of a generative adversarial network (GAN) to generate the required images, which are pseudo-images that are difficult to distinguish from real images.

According to our survey, the number of publicly available pavement datasets is not as large as that of natural scenes, and the amount of data is much smaller than what is available in well-known natural scene datasets, e.g., ImageNet [28] and KITTI [29]. There is currently no unified standard for pavement datasets, and many researchers are constrained by the fact that they do not have enough pavement data to support their experiments. Some researchers [23,24,25] have mixed multiple publicly available datasets or self-collected data to improve the performance of data-driven models, which suggests that publicly available datasets will help drive future developments in pavement damage detection.

In the last few years, road condition monitors have been applied in several countries so that uniform, balanced, and high-quality pavement images can be collected quickly, as these devices are equipped with high-speed cameras and comprehensive illumination systems. In this study, we developed a pavement image dataset of 10,400 images collected by an automatic monitoring vehicle on a section of a highway in Inner Mongolia, China; in addition, we have made the dataset publicly available for future research.

2.2. Image Annotation

Supervised learning is a type of machine learning in which an ANN learns a pattern from the training data and infers new instances based on that pattern. The currently popular data-driven methods for pavement damage detection are usually based on supervised learning, meaning that all objects to be detected in the training data need to be labeled. As mentioned in PASCAL VOC [30] and COCO [31], classification, detection, and segmentation are three different tracks and also correspond to three different data labeling methods. The annotation for the image classification task is at image-level and requires a binary label indicating whether or not the object appears in the image. For pavement damage classification, a common approach is to segment a large image into sub-images, which is also known as patch-level classification [14,32]. The annotation of the object detection task is at object-level, where both the category of the object and its position in the image need to be stated, and the position of the object is usually represented using a bounding box [17,18,19]. The annotation for the segmentation task is at pixel-level, where each pixel in the image is annotated with the category or the background to which it belongs [23,25].

Patch-level classification and object-level detection yield localization information on pavement damage, but the results are coarse and cannot be used for pavement assessments, whereas image segmentation can generate accurate crack localization and structures, which can be used to classify crack categories and evaluate important crack characteristics such as length and width [33]. In practical applications, we can label crack images at image-level and object-level relatively easily; however, the workload of pixel-level labeling is several times higher than that of the former two [28]. Therefore, a trade-off needs to be made between fine-grained crack detection and the workload of labeling the data. Considering the realistic practicality, in this study, we propose a redundant and dense object-level annotation style. To improve the efficiency of image annotation, we propose a semi-automatic annotation method (the details are presented in Section 3.2).

2.3. Detection Model

As mentioned before, there are three forms of data annotation for cracks, namely, image-level, object-level, and pixel-level annotation, so there are three corresponding data-processing models. The popularity of classification networks started with AlexNet [34], after which more powerful and efficient ANNs were proposed, such as GoogLeNet [35], ResNet [36], and MobileNet [37]. Their structure usually comprises a series of convolutional blocks used to extract image information and a fully connected layer that follows the convolutional layer for classification. Image classification is a fundamental task in the field of computer vision, and image classification networks are also used as the backbones of object detection and segmentation models, which means that the backbone network affects the results of detection or segmentation. Object detection models are often classified into two categories: one-stage and two-stage. R-CNN [38], Fast R-CNN [39], and Faster R-CNN [40] are two-stage object detection networks. In the first stage, the candidate regions need to be obtained, and then the candidate regions are detected and classified in the second stage. YOLO [41], SSD [42], RetinaNet [43], and FCOS [44] are single-stage target detection models that use a single CNN to detect objects and classify them, so the advantage of these networks is fast processing. The main segmentation models are FCN [45], DeepLab [46], and U-Net [47]. Although these types of networks can generate pixel-level predictions for cracks, we did not use them in this study because pixel-level annotation is too resource- and time-consuming to be applied in realistic scenarios.

In this study, 13 popular object detection models were divided into four groups according to model complexity, and extensive experiments were conducted on the dataset presented here, as shown in Section 4.

3. Proposed Dataset

3.1. Image Acquisition

At present, in the field of pavement distress detection, some researchers have acquired images with cameras or smartphones [17,48]. Although this approach allows one to acquire pavement images of different environments, with different illumination levels and shooting angles to improve the diversity of the data, it is quite inefficient. Unmanned aerial vehicles (UAVs) equipped with HD cameras have also been used to collect pavement images [49,50]. In order to obtain clear images of the road surface, the flight altitude and speed of UAVs are limited, so a suitable scenario for their application is an urban street with a low traffic volume. An automatic road measurement vehicle can collect road images at normal traffic flow speeds, and the high-resolution CCD sensor and LED illumination system ensure the uniformity of the captured images, so these have been applied by road authorities in several countries [3].

The images used in this study were obtained by a road measurement vehicle and collected from the Hulunbuir section of the Suiman Expressway in the Inner Mongolia Autonomous Region, China, as shown in Figure 1. A line scan industrial camera was mounted on the rear of the vehicle. The camera has a CCD sensor resolution of 3024 pixels and scans a road width of 3 m, with 1 pixel representing approximately 1 mm of road. A rotary encoder mounted on the rear axle of the vehicle generates pulses as the wheels move, and this pulse signal triggers the line scan camera to take a picture of the road surface for every 1 mm of vehicle movement. Therefore, 1 mm² of the road surface area corresponds to approximately 1 pixel in the image. In the process of road image collection, the vehicle’s driving speed is between 80 km/h and 120 km/h, and the camera can photograph the road surface uniformly at different vehicle speeds. In order to ensure that the images captured in different environments are balanced and uniform, a powerful integrated LED set is used to provide stable illumination conditions, as shown in Figure 2. Examples of the images are shown in Figure 3.

In this study, we collected 106,792 images with a resolution of 3024 × 1889 on a 15 km length of road. As mentioned in [17,51], we decided to crop the original images to a resolution of 600 × 600 pixels, meaning that each image represents a pavement area of approximately 0.36 m².

3.2. Image Annotation

As mentioned regarding ImageNet [28], there are two basic requirements in a bounding box annotation system: quality and coverage. Quality means that each bounding box needs to be tight, i.e., all visible parts of the object must be contained by a minimal bounding box. Coverage means that every instance of an object needs to have a bounding box to tell the algorithm which parts of the image are to be focused on. For common objects, individual instances such as people, chairs, and cars are easy to annotate using a properly scaled bounding box, and the annotated results are generally not ambiguous. However, cracks are different because they do not have a particular structure. For example, Chinese standards for evaluating the performance of highways classify cracks as alligator, block, transverse, or longitudinal cracks. Using only a single tight bounding box to label a transverse or longitudinal crack means that this bounding box will have an incongruous aspect ratio, as shown in Figure 4a. In addition, the study of ImageNet [28] showed that objects with thin structures have the worst localization accuracy. A huge bounding box will appear in an image when the trend of a crack is sloping, and in which the crack occupies only a tiny portion, while most of the pixels represent the background, as shown in Figure 4b. This type of annotation can lead to mismatched labels when training the network [52]. For common items, an object is one entity; however, a whole crack can be considered to be composed of many sub-cracks. Therefore, in this study, we propose a dense and redundant crack labeling method, in which a crack is densely contained by multiple tight small bounding boxes instead of a very long or large bounding box, and the adjacent bounding boxes need to overlap (see Figure 4). We propose this for the following reasons:

Dense annotations, although still blocky, can show more accurate crack localization and structures relative to patch-level annotations;
Redundant annotations increase the number of instances in the dataset.

3.2.1. Manual Annotation

CVAT [53] is a free online interactive tool for labeling videos or images. Because it supports multiple formats and allows good image and labeling management, we used it to label the pavement images based on its collaborative mechanism. In this study, cracks and sealed cracks were the objects to be labeled; however, in many current studies, sealed cracks are not the main detection targets. The reasons why we labeled the sealed cracks are as follows:

Road maintenance workers usually use sealant to repair cracks, but since the asphalt road structure has been damaged, cracks will soon reappear around the sealed cracks.
According to the Chinese highway performance assessment standards [54] and pavement distress identification manual issued by the U.S. Federal Highway Administration [55], for asphalt pavements, sealed cracks are also a type of pavement distress and are used to evaluate the highway maintenance quality.
Cracks and sealed cracks are treated differently in PMS [56,57].
According to our observations, sealed cracks, similar to cracks, are the dominant damage class for asphalt highways.

The two key challenges in labeling crack images are consistent labeling standards and the need for all cracks and sealed cracks to be labeled. Our approach involved teams of three people being trained in a standardized lesson before they started formal labeling. Once the labeling was complete, they needed to check each other’s work. That is, an image was labeled by one annotator and checked by two inspectors to ensure that there were no missed objects or incorrect labels.

The number of 600 × 600 pixel sub-images was about 500,000, which is an astronomical amount for a team of three people, and it was almost impossible to annotate them all by hand. We therefore developed a semi-automatic method for crack annotation, described in the next subsubsection.

3.2.2. Semi-Automatic Annotation

As mentioned in COCO [31], the annotation of all the data took thousands of worker-hours and was an extremely time-consuming task. We implemented a semi-automatic method consisting of six steps.

Step 1: We need to manually label some data to train the initial model. One question is how much data we need at a minimum to train a model to provide largely satisfactory results. For image classification tasks, Arya et al. [12] argued that at least 5000 labeled images per category are needed. For object detection, Maeda et al. [17] suggested that at least 1000 images per category are needed, while according to Shahinfar et al. [58], the rate of improvement in model performance starts to level off when the number of images is greater than 150–500. In this study, we manually labeled 800 images as the training data for training the initial model, considering that the annotation method used here generates more instances of each image.
Step 2: An initial model is trained on the manually labeled dataset in Step 1. The performance of this model will not be particularly good because of the problem of insufficient data, but it is still necessary to make the model relatively optimal by adjusting the hyperparameters. The model we used was YOLOv5s [59].
Step 3: The trained model is used to detect the unlabeled data and save the results.
Step 4: The results are reviewed, and manual corrections are made in the case of inaccurate results, including missed instances, mislabeling, and labeling that is not accurate enough.
Step 5: The corrected data and the initial data are merged into a new training set, and a new model is trained on this dataset.
Step 6: Steps 3–5 are repeated to continuously update the model until the performance of the model no longer improves significantly.

3.3. Data Statistics

The initial 800 images in the dataset were manually annotated by a three-person team, and the annotation and review processes took 8 and 4 worker-hours, respectively, with an average time of 0.9 min per image. In the semi-automatic labeling stage, 800 unlabeled images were fed into the model trained in the previous stage in each cycle. After 12 cycles, 9600 semi-automatically labeled images were obtained in total. In each cycle, we trained a YOLOv5s model from scratch; the maximum training epoch was set to 50, and the model with the highest mean average precision (mAP) was selected for detecting unlabeled data. All detected instances produced in each cycle were reloaded into CVAT, checked, and modified by three people in the team. In the end, 9600 images consumed 26.6 worker-hours, an average of 0.16 min per image, which is one-fifth of the manual annotation process.

We obtained a total of 13 crack detection models in the manual labeling and semi-automatic labeling stages, and the mAP of each model is shown in Figure 5. At the beginning, the mAP value increased rapidly; after that, the performance of the model barely improved.

In the process of reviewing the semi-automatic labeled images, the most annotation errors we found occurred on the pavement’s white lines. White line damage was mistakenly identified by the model as cracks or sealed cracks. Another annotation error was that the model sometimes misidentified pavement stains as cracks or sealed cracks, but this error gradually decreased with the number of training cycles, and a richer dataset would also help to reduce these issues.

Overall, the dataset we developed contains 10,400 pavement images. Among these images, 202,840 bounding boxes were created, of which 132,012 are bounding boxes for cracks and the rest are for sealed cracks. There was a difference between the manually annotated data and the semi-automated annotated data, e.g., in our statistics, the average number of instances per sample was 13 in the first 800 images, but in the other semi-automated annotated image data, the number was 20. The reason for this is our proposed dense and redundant annotation method. Common sense considers a crack to be a complete crack, while we divided each whole crack into sub-cracks, which is the way we labeled them. Each part of the crack has a chance to be detected by the model, thus generating more detected instances than manual annotation. Although the semi-automatic annotation method is more dense and more redundant, after checking, we found that the result was still good because it satisfied the two criteria of crack annotation: consistency and coverage. Figure 6 shows a comparison of manual annotation with semi-automatic annotation.

4. Experimental Setup

In previous studies on crack detection, we found that lightweight models were often used for subsequent deployment on edge devices [17,60]. It is generally agreed that deeper models have better feature extraction capabilities, but this also means more parameters and a longer training time. Therefore, after a comprehensive consideration, 13 currently prevailing object detection models were used for experiments on the dataset we developed in this study. According to the number of parameters in the models, they were divided into four groups, so that the number of parameters in each group was similar, as shown in Table 2. All these models are open-source. YOLOv5s was the model used in this study to generate pseudo-labels, and we used it as a benchmark in the experiments.

In our experiments, all these models were based on the PyTorch framework. We used a cloud server as the training platform, and the GPU used was an NVIDIA RTX A5000 with 24 GB memory. We adjusted the batch size to the scale of each model to maximize GPU utilization. All these models were pre-trained on COCO [31] or ImageNet [28], but instead of using the pre-trained weights, we trained them from scratch on our dataset.

Precision (Equation 1), recall (Equation 2), F-score (Equation 3), and mAP are common metrics used to evaluate the performance of object detection models. Precision is the proportion of relevant instances in the retrieved instances, while recall is the proportion of relevant instances retrieved, and they are both based on relevance. In Equations (1) and (2), true positive (TP) indicates the correct detection of ground truth, false negative (FN) indicates objects that were not detected, and false positive (FP) indicates incorrect detection. In the post-processing stage, the detection of an object was evaluated by the intersection over union (IoU), which indicates the degree of overlap between the predicted result and the true annotation. In this study, we considered IoU > 0.5 to validate the detected instances. The confidence score is also a free parameter and indicates a model’s certainty about the detection results. In this study, all detected instances with confidence scores less than 0.25 were filtered out. The F-score (Equation 3) is a method of combining the precision and recall of the model, and it is defined as the harmonic mean of these. We can adjust

β

(Equation 3) to give more importance to precision over recall, or vice versa. Common adjusted F-scores include the F0.5-score and the F2-score, as well as the standard F1-score. In this study, only the F1-score was considered, which effectively reflects the model’s overall capability. Among the metrics defined for the Pascal VOC challenge [30], mAP was calculated over a single IoU value, 0.5, while COCO [31] was different and more rigorous, in that 10 different IoU thresholds were considered. Both evaluation criteria were considered in the study.

Precision = \frac{TP}{TP + FP}

(1)

Recall = \frac{TP}{TP + FN}

(2)

F_{β} = \frac{(β^{2} + 1) Precision \cdot Recall}{β^{2} \cdot Precision + Recall}

(3)

Multiply-accumulate operations (MACs) and parameters were used to measure the computational complexity of the ANNs, and these were calculated in the experiments. Training time and inference time are also important factors to be considered in practical applications, so these were recorded and compared. In this study, the training time was measured for every epoch with an NVIDIA RTX A5000, and the inference time was measured for every image with an NVIDIA RTX 2070 SUPER. The inference time included both the pre-processing and post-processing steps. Some images that never appeared in the training and validation sets were used to evaluate the performance of all the models, comprising data from the same measurement vehicle and the same highway.

5. Results and Discussion

In this section, we describe the evaluation of all the models mentioned in Table 2 on the dataset we developed. In this study, 9600 images were randomly selected as the training set, and the other 800 images were used as the validation set. As mentioned in Section 4, all the models were divided into four groups depending on their parameters, and models within the same group were used for comparative studies. We chose this method because the application scenarios of the models with different numbers of parameters are different. Lightweight models can be deployed on devices with lower computing power such as smartphones, while large models require the support of GPUs with higher computing power.

The models in Group 1 are fairly lightweight models, and their results are listed in Table 3. SSDlite320 MobileNetV3-Large is a model based on SSD and MobileNet, and the input images needed to be resized to 320 × 320 pixels. Its MAC value was much lower than that of YOLOv5n and YOLOv5s, which means that its computational complexity was very low. Except for the computational complexity, this model lagged behind for all other metrics across the board. It is also interesting that despite having the lowest computational complexity, the training time and inference time of SSDlite320 MobileNetV3-Large were longer than those of the other two models. From the confusion matrix (see Figure 7), 4476 detected instances containing only the background were mistakenly considered as cracks and sealed cracks by SSDlite320 MobileNetV3-Large, but for YOLOv5n and YOLOv5s, the numbers of mistaken backgrounds were 3399 and 2771, respectively. In Table 3, we can see that the results of YOLOv5n and YOLOv5s are very close, but the computational complexity of the former is only about a quarter of the latter. Their PR curves and confusion matrices are shown in Figure 7.

In the second group, Faster R-CNN MobileNetV3-Large-FPN and Faster R-CNN MobileNetV3-Large-320-FPN are based on Faster R-CNN and MobileNet. The difference is that the input of the latter is resized to 320 × 320 pixels. Although the parameters of the two networks are exactly the same, the computational complexity of the high-resolution model was seven times that of the low-resolution model. It can be seen from the results in Table 4 that the low-resolution version was completely backward. Considering the results of the SSDlite320 MobileNetV3-Large in the first group, we argue that scaling down the image size leads to poor performance, since the dataset developed in this study can be considered to be a small object dataset, and reducing the resolution means loss of information. The parameters of YOLOv5m are close to those of the previous two networks, but the results were comprehensively better. The PR curves and confusion matrices are shown in Figure 8.

The networks in the third group are all single-stage models, and their results are shown in Table 5. The first three models are similar in that their recall is high and their precision is relatively low. As shown in their confusion matrix, all three models yielded more FPs relative to the models in the first two groups. The results of SSD300 VGG16 are more balanced, and the model has the lowest computational complexity. The PR curves and confusion matrices are shown in Figure 9.

In Group 4, the MACs values of the Faster-R-CNN-based models were much larger than those of YOLOv5l, and the mAP (Pascal VOC) values were very close, but the mAP (COCO) was slightly lower compared with YOLOv5l. The imbalance between precision and recall was also the main drawback of the first two models, as mentioned for Group 3. The results are shown in Table 6, and the PR curves and confusion matrices are shown in Figure 10.

Overall, all the models were more accurate in detecting sealed cracks than cracks. We believe that sealed cracks are more clearly characterized than cracks. It also can be seen that the low-resolution models performed worse compared with the high-resolution models. It should be noted that all the PR curves have been truncated because detected instances with confidence scores less than 0.25 and IoUs less than 0.5 have been filtered. All the trained models are publicly available.

Images that never appeared in the training and validation sets were used for testing, in order to understand the capabilities of the different models more intuitively. Some representative results are shown in Figure 11.

In general, the YOLOv5 series model performed better on our dataset, while the lower-resolution models were not suitable. More importantly, we can see that the dense and redundant annotation method proposed in this study is very effective and can be applied to most of the current popular object detection models. In the detection results, we can see that cracks and sealed cracks can be accurately detected, and at the same time, the structure of the cracks can be inferred. The detection of cracks and sealed cracks will be very efficient if the YOLOv5 series model with a very short training time and inference time is used.

6. Conclusions

In this study, we proposed a novel dense and redundant annotation method for detecting the structural features of asphalt pavement cracks and sealed cracks. Based on this new annotation method, we developed a dataset containing 10,400 pavement images and made this dataset publicly available for future studies. A semi-automatic method of annotating cracks and sealed cracks was used in order to improve the efficiency of dataset creation. Compared with manual labeling, the semi-automatic labeling method saved 80% of image annotation time, which greatly improved the efficiency of the entire crack detection pipeline. Finally, we tested 13 currently popular object detection models, and the results show that the dense and redundant labeling methods are effective. In summary, the YOLOv5 series models are better and balanced performers among all models, and YOLOv5s is the best one with an F1-score of 86.79% and an inference time of 14.8ms. To conclude, by combining semi-automatic label generation and redundant and dense object annotation with the YOLOv5 series models, we can achieve efficient pavement crack and sealed crack detection.

Author Contributions

Conceptualization, methodology, N.Y., Y.L. and R.M.; software, N.Y.; validation, formal analysis, R.M.; writing—original draft preparation, review, and editing, N.Y. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in https://github.com/CHDyshli/PavementCrackDetection (accessed on 19 August 2022).

Acknowledgments

This work was supported by Scientific Research Project of Department of Transport of Shaanxi Province in 2020: Research on Informatization and Application of Highway Pavement Disease Treatment Based on IoT + BIM (No. 20-25X). The authors would like to thank Ronggui Ma for excellent technical support and Lin Xiao for critically helping in my life.

Conflicts of Interest

The authors declare no conflict of interest.

References

Munawar, H.S.; Hammad, A.W.; Haddad, A.; Soares, C.A.P.; Waller, S.T. Image-Based Crack Detection Methods: A Review. Infrastructures 2021, 6, 115. [Google Scholar] [CrossRef]
Tran, T.S.; Tran, V.P.; Lee, H.J.; Flores, J.M.; Le, V.P. A Two-Step Sequential Automated Crack Detection and Severity Classification Process for Asphalt Pavements. Int. J. Pavement Eng. 2022, 23, 2019–2033. [Google Scholar] [CrossRef]
Zakeri, H.; Nejad, F.M.; Fahimifar, A. Image Based Techniques for Crack Detection, Classification and Quantification in Asphalt Pavement: A Review. Arch. Comput. Methods Eng. 2017, 24, 935–977. [Google Scholar] [CrossRef]
Liu, F.; Liu, J.; Wang, L. Asphalt Pavement Crack Detection Based on Convolutional Neural Network and Infrared Thermography. IEEE Trans. Intell. Transp. Syst. 2022, 1–11. [Google Scholar] [CrossRef]
Torbaghan, M.E.; Li, W.; Metje, N.; Burrow, M.; Chapman, D.N.; Rogers, C.D. Automated Detection of Cracks in Roads Using Ground Penetrating Radar. J. Appl. Geophy. 2020, 179, 104118. [Google Scholar] [CrossRef]
Chapeleau, X.; Blanc, J.; Hornych, P.; Gautier, J.-L.; Carroget, J. Use of Distributed Fiber Optic Sensors to Detect Damage in a Pavement. In Proceedings of the 7th European Workshop on Structural Health Monitoring (EWSHM 2014), Nantes, France, 8–14 July 2014; pp. 449–457. [Google Scholar]
Özgenel, Ç.F.; Sorguç, A.G. Performance Comparison of Pretrained Convolutional Neural Networks on Crack Detection in Buildings. In Proceedings of the ISARC International Symposium on Automation and Robotics in Construction, Berlin, Germany, 20–25 July 2018; Volume 35, pp. 1–8. [Google Scholar]
Shi, Y.; Cui, L.; Qi, Z.; Meng, F.; Chen, Z. Automatic Road Crack Detection Using Random Structured Forests. IEEE Trans. Intell. Transp. Syst. 2016, 17, 3434–3445. [Google Scholar] [CrossRef]
Kyal, C.; Reza, M.; Varu, B.; Shreya, S. Image-Based Concrete Crack Detection Using Random Forest and Convolution Neural Network. In Computational Intelligence in Pattern Recognition; Springer: Singapore, 2022; pp. 471–481. [Google Scholar]
Wang, S.; Yang, F.; Cheng, Y.; Yang, Y.; Wang, Y. Adaboost-Based Crack Detection Method for Pavement. In IOP Conference Series: Earth and Environmental Science, Proceeding of the 2018 International Conference on Civil and Hydraulic Engineering (IConCHE 2018), Qingdao, China, 23–25 November 2018; IOP Publishing: Bristol, UK, 2018; Volume 189, p. 022005. [Google Scholar]
Rezazadeh Eidgahee, D.; Jahangir, H.; Solatifar, N.; Fakharian, P.; Rezaeemanesh, M. Data-Driven Estimation Models of Asphalt Mixtures Dynamic Modulus Using ANN, GP and Combinatorial GMDH Approaches. Neural. Comput. Appl. 2022, 34, 17289–17314. [Google Scholar] [CrossRef]
Arya, D.; Maeda, H.; Ghosh, S.K.; Toshniwal, D.; Mraz, A.; Kashiyama, T.; Sekimoto, Y. Deep Learning-Based Road Damage Detection and Classification for Multiple Countries. Autom. Constr. 2021, 132, 103935. [Google Scholar] [CrossRef]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE Inst. Electr. Electron. Eng. 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
Eisenbach, M.; Stricker, R.; Seichter, D.; Amende, K.; Debes, K.; Sesselmann, M.; Ebersbach, D.; Stoeckert, U.; Gross, H.-M. How to Get Pavement Distress Detection Ready for Deep Learning? A Systematic Approach. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 2039–2047. [Google Scholar]
Maniat, M.; Camp, C.V.; Kashani, A.R. Deep Learning-Based Visual Crack Detection Using Google Street View Images. Neural. Comput. Appl. 2021, 33, 14565–14582. [Google Scholar] [CrossRef]
Wang, Z.; Yang, J.; Jiang, H.; Fan, X. CNN Training with Twenty Samples for Crack Detection via Data Augmentation. Sensors 2020, 20, 4849. [Google Scholar] [CrossRef] [PubMed]
Maeda, H.; Sekimoto, Y.; Seto, T.; Kashiyama, T.; Omata, H. Road Damage Detection and Classification Using Deep Neural Networks with Smartphone Images. Comput. Aided Civ. Infrastruct. Eng. 2018, 33, 1127–1141. [Google Scholar] [CrossRef] [Green Version]
Deng, J.; Lu, Y.; Lee, V.C.-S. Imaging-Based Crack Detection on Concrete Surfaces Using You Only Look Once Network. Struct. Health Monit. 2021, 20, 484–499. [Google Scholar] [CrossRef]
Han, Z.; Chen, H.; Liu, Y.; Li, Y.; Du, Y.; Zhang, H. Vision-Based Crack Detection of Asphalt Pavement Using Deep Convolutional Neural Network. Iran. J. Sci. Technol. Trans. Civ. Eng. 2021, 45, 2047–2055. [Google Scholar] [CrossRef]
Maeda, H.; Kashiyama, T.; Sekimoto, Y.; Seto, T.; Omata, H. Generative Adversarial Network for Road Damage Detection. Comput. Aided Civ. Infrastruct. Eng. 2021, 36, 47–60. [Google Scholar] [CrossRef]
Wu, Y.; Yang, W.; Pan, J.; Chen, P. Asphalt Pavement Crack Detection Based on Multi-Scale Full Convolutional Network. J. Intell. Fuzzy Syst. 2021, 40, 1495–1508. [Google Scholar] [CrossRef]
Li, H.; Xiong, P.; Fan, H.; Sun, J. Dfanet: Deep Feature Aggregation for Real-Time Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9522–9531. [Google Scholar]
Qiao, W.; Liu, Q.; Wu, X.; Ma, B.; Li, G. Automatic Pixel-Level Pavement Crack Recognition Using a Deep Feature Aggregation Segmentation Network with a ScSE Attention Mechanism Module. Sensors 2021, 21, 2902. [Google Scholar] [CrossRef]
Nguyen, N.H.T.; Perry, S.; Bone, D.; Le, H.T.; Nguyen, T.T. Two-Stage Convolutional Neural Network for Road Crack Detection and Segmentation. Expert. Syst. Appl. 2021, 186, 115718. [Google Scholar] [CrossRef]
Xiang, X.; Zhang, Y.; El Saddik, A. Pavement Crack Detection Network Based on Pyramid Structure and Attention Mechanism. IET Image Process. 2020, 14, 1580–1586. [Google Scholar] [CrossRef]
Amhaz, R.; Chambon, S.; Idier, J.; Baltazart, V. Automatic Crack Detection on Two-Dimensional Pavement Images: An Algorithm Based on Minimal Path Selection. IEEE Trans. Intell. Transp. Syst. 2016, 17, 2718–2729. [Google Scholar] [CrossRef]
Oliveira, H.; Correia, P.L. CrackIT—An Image Processing Toolbox for Crack Detection and Characterization. In Proceedings of the 2014 IEEE International Conference on Image Processing, Paris, France, 27–30 October 2014; pp. 798–802. [Google Scholar]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision Meets Robotics: The Kitti Dataset. Ind. Robot. 2013, 32, 1231–1237. [Google Scholar] [CrossRef] [Green Version]
Everingham, M.; Eslami, S.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes Challenge: A Retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]
Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft Coco: Common Objects in Context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
Kim, J.J.; Kim, A.-R.; Lee, S.-W. Artificial Neural Network-Based Automated Crack Detection and Analysis for the Inspection of Concrete Structures. Appl. Sci. 2020, 10, 8105. [Google Scholar] [CrossRef]
Hsieh, Y.-A.; Tsai, Y.J. Machine Learning for Crack Detection: Review and Model Performance Comparison. J Comput. Civ. Eng. 2020, 34, 04020038. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet Classification with Deep Convolutional Neural Networks. Commun. ACM 2012, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast R-Cnn. In Proceedings of the IEEE international Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-Cnn: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2015; Volume 28. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single Shot Multibox Detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully Convolutional One-Stage Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Santiago, Chile, 7–13 December 2015; pp. 3431–3440. [Google Scholar]
Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected Crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [Green Version]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
Liu, J.; Yang, X.; Lau, S.; Wang, X.; Luo, S.; Lee, V.C.-S.; Ding, L. Automated Pavement Crack Detection and Segmentation Based on Two-Step Convolutional Neural Network. Comput. Aided Civ. Infrastruct. Eng. 2020, 35, 1291–1305. [Google Scholar] [CrossRef]
Lei, B.; Ren, Y.; Wang, N.; Huo, L.; Song, G. Design of a New Low-Cost Unmanned Aerial Vehicle and Vision-Based Concrete Crack Inspection Method. Struct. Control Health Monit. 2020, 19, 1871–1883. [Google Scholar] [CrossRef]
Ji, A.; Xue, X.; Wang, Y.; Luo, X.; Wang, L. Image-Based Road Crack Risk-Informed Assessment Using a Convolutional Neural Network and an Unmanned Aerial Vehicle. Struct. Control Health Monit. 2021, 28, e2749. [Google Scholar] [CrossRef]
Tran, V.P.; Tran, T.S.; Lee, H.J.; Kim, K.D.; Baek, J.; Nguyen, T.T. One Stage Detector (RetinaNet)-Based Crack Detection for Asphalt Pavements Considering Pavement Distresses and Surface Objects. J. Civ. Struct. Health Monit. 2021, 11, 205–222. [Google Scholar] [CrossRef]
Xia, B.; Cao, J.; Zhang, X.; Peng, Y. Automatic Concrete Sleeper Crack Detection Using a One-Stage Detector. Int. J. Intell. Robot. Appl. 2020, 4, 319–327. [Google Scholar] [CrossRef]
Sekachev, B.; Manovich, N.; Zhiltsov, M.; Zhavoronkov, A.; Kalinin, D.; Hoff, B.; TOsmanov; Kruchinin, D.; Zankevich, A.; DmitriySidnev; et al. Opencv/Cvat; V1.1.0 2020; Zenodo: Meyrin, Switzerland, 2020. [Google Scholar] [CrossRef]
Ministry of Transport of the People’s Republic of China. Highway Performance Assessment Standards; JTG H20-2007; China Communications Press: Beijing, China, 2007. [Google Scholar]
Pavement Distress Identification Manual for the NPS Road Inventory Program. Available online: https://www.carbonyte.com/Documents/PCR%20Distress-ID-Manual.pdf (accessed on 19 August 2022).
Zhang, K.; Cheng, H.; Zhang, B. Unified Approach to Pavement Crack and Sealed Crack Detection Using Preclassification Based on Transfer Learning. J. Comput. Civ. Eng. 2018, 32, 04018001. [Google Scholar] [CrossRef]
Hoang, N.-D.; Huynh, T.-C.; Tran, X.-L.; Tran, V.-D. A Novel Approach for Detection of Pavement Crack and Sealed Crack Using Image Processing and Salp Swarm Algorithm Optimized Machine Learning. Adv. Civ. Eng. 2022, 2022, 1–21. [Google Scholar] [CrossRef]
Shahinfar, S.; Meek, P.; Falzon, G. “How Many Images Do I Need?” Understanding How Sample Size per Class Affects Deep Learning Model Performance Metrics for Balanced Designs in Autonomous Wildlife Monitoring. Ecol. Inform. 2020, 57, 101085. [Google Scholar] [CrossRef]
Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; NanoCode012; Kwon, Y.; Xie, T.; Fang, J.; imyhxy; Michael, K.; et al. Ultralytics/Yolov5; V6.1-TensorRT; TensorFlow Edge TPU and OpenVINO Export and Inference; Zenodo: Meyrin, Switzerland, 2022. [Google Scholar] [CrossRef]
Magalhães, S.A.; Castro, L.; Moreira, G.; Dos Santos, F.N.; Cunha, M.; Dias, J.; Moreira, A.P. Evaluating the Single-Shot Multibox Detector and YOLO Deep Learning Models for the Detection of Tomatoes in a Greenhouse. Sensors 2021, 21, 3569. [Google Scholar] [CrossRef]

Figure 1. Location and vehicle of data collection: (a) location of the pavement images; (b) the road measurement vehicle; (c) front view of the road measurement vehicle.

Figure 2. CCD and illumination system.

Figure 3. Examples of images collected by the road measurement vehicle: (a) intact road surface; (b) crack image; (c) sealed crack image; (d) crack and sealed crack image.

Figure 4. A comparison of traditional annotation and our proposed method: (a) a longitudinal crack is labeled with traditional annotation (green) and our dense and redundant annotations (red); (b) a sloping sealed crack is labeled with traditional annotation (green) and our dense and redundant annotations (yellow).

Figure 5. The mAP value for each model over 13 cycles.

Figure 6. Comparison of manual annotation and semi-automatic annotation: (a) our dense and redundant manual annotation; (b) semi-automatic annotation, which was denser and more redundant.

Figure 7. PR curves and confusion matrices of models in Group 1: (a) PR curve of SSDlite320 MobileNetV3-Large; (b) confusion matrix of SSDlite320 MobileNetV3-Large; (c) PR curve of YOLOv5n; (d) confusion matrix of YOLOv5n; (e) PR curve of YOLOv5s; (f) confusion matrix of YOLOv5s.

Figure 8. PR curves and confusion matrices of models in Group 2: (a) PR curve of Faster R-CNN MobileNetV3-Large-FPN; (b) confusion matrix of Faster R-CNN MobileNetV3-Large-FPN; (c) PR curve of Faster R-CNN MobileNetV3-Large-320-FPN; (d) confusion matrix of Faster R-CNN MobileNetV3-Large-320-FPN; (e) PR curve of YOLOv5m; (f) confusion matrix of YOLOv5m.

Figure 9. PR curves and confusion matrices of models in Group 3: (a) PR curve of FCOS ResNet50-FPN; (b) confusion matrix of FCOS ResNet50-FPN; (c) PR curve of RetinaNet ResNet50-FPN; (d) confusion matrix of RetinaNet ResNet50-FPN; (e) PR curve of RetinaNet ResNet50-FPN v2; (f) confusion matrix of RetinaNet ResNet50-FPN v2; (g) PR curve of SSD300 VGG16; (h) confusion matrix of SSD300 VGG16.

Figure 10. PR curves and confusion matrices of models in Group 4: (a) PR curve of Faster R-CNN ResNet50-FPN; (b) confusion matrix of Faster R-CNN ResNet50-FPN; (c) PR curve of Faster R-CNN ResNet50-FPN v2; (d) confusion matrix of Faster R-CNN ResNet50-FPN v2; (e) PR curve of YOLOv5l; (f) confusion matrix of YOLOv5l.

Figure 11. Comparison of the results of all 13 models: (a) SSDlite320 MobileNetV3-Large; (b) YOLOv5n; (c) YOLOv5s; (d) Faster R-CNN MobileNetV3-Large-FPN; (e) Faster R-CNN MobileNetV3-Large-320-FPN; (f) YOLOv5m; (g) FCOS ResNet50-FPN; (h) RetinaNet ResNet50-FPN; (i) RetinaNet ResNet50-FPN v2; (j) SSD300 VGG16; (k) Faster R-CNN ResNet50-FPN; (l) Faster R-CNN ResNet50-FPN v2; (m) YOLOv5l.

Table 1. Recent studies on crack detection (N/A—Not Available).

Image Dataset	Image Annotation Type	Detection Model	Researchers/Year
2468 grayscale images with 6 distress types, collected by a measurement vehicle	Patch-level	LeNet-based model for crack classification	Eisenbach et al. [14] 2017
48,000 crack or non-crack images, collected from Google Street View	Patch-level	VGG-based model for crack classification	Maniat et al. [15] 2021
400 crack images from routine civil construction	Object-level	Faster R-CNN model for crack detection	Wang et al. [16] 2020
9053 images with 8 different distress types, captured by a smartphone	Object-level	SSD-based model for crack detection	Maeda et al. [17] 2018
3100 real-world images captured by digital camera	Object-level	YOLOv2-based model for crack detection	Deng et al. [18] 2021
280 highway pavement and urban road images	Object-level	SSD-based model for crack detection	Han et al. [19] 2021
2000 pothole images captured by a smartphone	N/A	GAN-based model for data augmentation	Maeda et al. [20] 2021
95,000 road crack images	Pixel-level	FCN-based model for crack segmentation	Wu et al. [21] 2021
Combination of publicly available crack dataset	Pixel-level	DFANet-based [22] model for crack segmentation	Qiao et al. [23] 2021
Combination of publicly available crack dataset	Pixel-level	Encoder-decoder-based model for crack segmentation	Nguyen et al. [24] 2021
Combination of publicly available crack dataset	Pixel-level	Encoder-decoder-based model for crack segmentation	Xiang et al. [25] 2020

Table 2. The 13 models divided into four groups.

Group	Model	Parameters
Group 1	SSDlite320 MobileNetV3-Large	3.4 M
	YOLOv5n	1.9 M
	YOLOv5s	7.2 M
Group 2	Faster R-CNN MobileNetV3-Large-FPN	19.4 M
	Faster R-CNN MobileNetV3-Large-320-FPN	19.4 M
	YOLOv5m	21.2 M
Group 3	FCOS ResNet50-FPN	32.3 M
	RetinaNet ResNet50-FPN	34.0 M
	RetinaNet ResNet50-FPN v2	38.2 M
	SSD300 VGG16	35.6 M
Group 4	Faster R-CNN ResNet50-FPN	41.8 M
	Faster R-CNN ResNet50-FPN v2	43.7 M
	YOLOv5l	46.5 M

Table 3. The results of the models in Group 1.

Model	Training Time	Inference Time	Precision	Recall	F1	mAP (COCO)	mAP (Pascal VOC)	MACs
SSDlite320 MobileNetV3-Large	230 s	40.2 ms	65.81%	56.07%	60.55%	14.62%	45.57%	528.96 M
YOLOv5n	51 s	14.8 ms	80.18%	89.50%	84.58%	50.50%	85.52%	2.235 G
YOLOv5s	52 s	14.8 ms	83.38%	90.45%	86.79%	55.80%	88.04%	8.218 G

Table 4. The results of the models in Group 2.

Model	Training Time	Inference Time	Precision	Recall	F1	mAP (COCO)	mAP (Pascal VOC)	MACs
Faster R-CNN MobileNetV3-Large-FPN	324 s	77.4 ms	57.86%	89.90%	60.35%	34.14%	75.46%	18.410 G
Faster R-CNN MobileNetV3-Large-320-FPN	214 s	60.1 ms	42.76%	39.81%	41.23%	8.27%	30.97%	2.807 G
YOLOv5m	90 s	19.0 ms	80.29%	91.30%	85.44%	54.03%	87.87%	24.438 G

Table 5. The results of the models in Group 3.

Model	Training Time	Inference Time	Precision	Recall	F1	mAP (COCO)	mAP (Pascal VOC)	MACs
FCOS ResNet50-FPN	403 s	97.8 ms	26.45%	97.07%	41.58%	33.19%	75.50%	125.502 G
RetinaNet ResNet50-FPN	621 s	95.4 ms	50.09%	93.48%	65.23%	31.71%	72.74%	127.193 G
RetinaNet ResNet50-FPN v2	665 s	94.7 ms	51.61%	94.44%	66.75%	33.33%	75.90%	128.457 G
SSD300 VGG16	196 s	76.0 ms	65.66%	76.72%	70.76%	26.36%	64.83%	30.477 G

Table 6. The results of the models in Group 4.

Model	Training Time	Inference Time	Precision	Recall	F1	mAP (COCO)	mAP (Pascal VOC)	MACs
Faster R-CNN ResNet50-FPN	657 s	97.3 ms	57.54%	95.52%	71.82%	42.29%	84.61%	133.929 G
Faster R-CNN ResNet50-FPN v2	848 s	95.3 ms	59.57%	94.88%	73.19%	42.95%	84.91%	280.797 G
YOLOv5l	146 s	25.3 ms	80.87%	91.23%	85.74%	53.57%	87.65%	32.700 G

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, N.; Li, Y.; Ma, R. An Efficient Method for Detecting Asphalt Pavement Cracks and Sealed Cracks Based on a Deep Data-Driven Model. Appl. Sci. 2022, 12, 10089. https://doi.org/10.3390/app121910089

AMA Style

Yang N, Li Y, Ma R. An Efficient Method for Detecting Asphalt Pavement Cracks and Sealed Cracks Based on a Deep Data-Driven Model. Applied Sciences. 2022; 12(19):10089. https://doi.org/10.3390/app121910089

Chicago/Turabian Style

Yang, Nan, Yongshang Li, and Ronggui Ma. 2022. "An Efficient Method for Detecting Asphalt Pavement Cracks and Sealed Cracks Based on a Deep Data-Driven Model" Applied Sciences 12, no. 19: 10089. https://doi.org/10.3390/app121910089

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Efficient Method for Detecting Asphalt Pavement Cracks and Sealed Cracks Based on a Deep Data-Driven Model

Abstract

1. Introduction

2. Related Work

2.1. Image Dataset of Pavement Distress

2.2. Image Annotation

2.3. Detection Model

3. Proposed Dataset

3.1. Image Acquisition

3.2. Image Annotation

3.2.1. Manual Annotation

3.2.2. Semi-Automatic Annotation

3.3. Data Statistics

4. Experimental Setup

5. Results and Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI