Next Article in Journal
Multi-Task Deep Evidential Sequence Learning for Trustworthy Alzheimer’s Disease Progression Prediction
Previous Article in Journal
Improving Structured Grid-Based Sparse Matrix-Vector Multiplication and Gauss–Seidel Iteration on GPDSP
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Relative-Breakpoint-Based Crack Annotation Method for Lightweight Crack Identification Using Deep Learning Methods

1
Department of Bridge Engineering, School of Transportation, Southeast University, Nanjing 211189, China
2
Department of Civil and Environmental Engineering, Louisiana State University, Baton Rouge, LA 70803, USA
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(15), 8950; https://doi.org/10.3390/app13158950
Submission received: 16 July 2023 / Revised: 30 July 2023 / Accepted: 2 August 2023 / Published: 3 August 2023
(This article belongs to the Section Civil Engineering)

Abstract

:
After years of service, bridges could lose their expected functions. Considering the significant number of bridges and the adverse inspecting environment, the urgent requirement for timely and efficient inspection solutions, such as computer vision techniques, have been attractive in recent years, especially for those bridge components with poor accessibility. In this paper, a lightweight procedure for bridge apparent-defect detection is proposed, including a crack annotation method and crack detection. First of all, in order to save computational costs and improve generalization performance, we propose herein a relative-breakpoint annotation method to build a crack instance segmentation dataset, a critical process for a supervised vision-based crack detection method. Then, the trained models based on classic Mask RCNN and Yolact are transferred to evaluate the effectiveness of the proposed method. To verify the correctness, universality and generality of the proposed crack-detection framework, approximately 800 images are used for model training, while nearly 100 images are saved for validation. Results show that the crack instance segmentation model can achieve a level of 90% for both accuracy and recall values, with a limited dataset.

1. Introduction

As they address the most typical surface defects of bridges, crack detection and repair are the keys to bridge maintenance. In the context of increasing pressure on bridge management and maintenance, the traditional manual inspection is no longer applicable; its drawbacks include a high leakage rate, low efficiency and high labor cost. Therefore, an efficient, accurate and economical solution is urgently needed to replace the traditional manual inspection. Digital-image-based nondestructive crack detection technology at first had the hope of replacing the traditional manual inspection. But it still cannot completely replace the manual inspection, because it is highly susceptible to noise and lacks the ability to extract deep semantic information from images. Additionally, its accuracy is far from meeting the actual engineering requirements in the cases of cracks with complex and diverse dimensional characteristics and noisy detection environments. A study by Dong et al. [1] pointed out that, although traditional image processing techniques and machine learning can identify various types of structural damage from images, the processing of image data is still time-consuming and prone to a high number of false positive detection results. Ibrahim et al. [2] reported that CNN outperformed SVM and KNN in terms of accuracy for damage detection for evaluating the health condition of two simulated four- and eight-story building structures subjected to earthquakes. Hou et al. [3] pointed out that manual recognition methods and traditional machine vision methods are inefficient. Training samples, as well as specific information about the defects, such as contour and location information, are not fully available in different environments.
The rapid development of deep learning in recent years has provided new ideas for intelligent health monitoring of bridges. Deep learning has been widely used in various fields of computer vision, and its biggest advantage over traditional digital image methods is its ability to extract deep features of images. Consequently, deep learning can resist the noise of various complex detection environments and significantly improve detection accuracy after effective training. Yeum et al. [4] integrated SfM technique and a pre-trained CNN model to localize and classify the ROI of the same full-size road sign structure, which showed better performance than previous methods. Narazaki et al. [5] used FCN and recursive structure (long- and short-term memory, LSTM) to automatically segment and label the different structural components of a bridge structure and work with synthetic video datasets. The above study demonstrates that the application of deep-learning-based computer vision techniques for bridge crack detection can effectively address the shortcomings of traditional digital image methods and computer vision techniques. Combined with the use of UAV technology, deep learning can achieve nondestructive data collection of the bridge components’ defects. Processing the data collected by the UAV using deep learning can achieve automated crack identification. More importantly, deep learning can be combined with the GPS positioning module and the sensor information of the UAV to achieve accurate localization and dimension assessment of the bridge cracks’ defects.
The application of deep learning techniques in the field of crack detection is divided into object detection and segmentation. Object detection of cracks is performed in the form of rectangular boxes (bbox) used to locate cracks in a given image. The semantic segmentation indicates the category to which each pixel of the original image belongs. The output is a mask that reveals the category to which each pixel belongs. The instance segmentation distinguishes and counts the detected objects based on the semantic segmentation, and outputs a rectangular box (bbox) with the same meaning as the object detection, in addition to the mask. The core idea of the above tasks is to continuously extract the feature information of a given image through various network structures, which is, essentially, to perform matrix operations at multiple levels and finally obtain the coordinate data of bbox or mask data. The widely used algorithmic architectures for object detection include YOLO [6] series, Faster RCNN [7] and DETR [8]; semantic segmentation algorithmic architectures include Unet [9], DeepLab [10], etc.; instance segmentation algorithmic architectures include Mask RCNN [11], Yolact [12], etc. The network models used by different algorithms for extracting features are Convolutional Neural Networks [13] (CNN) and Transformer [14].
Therefore, many scholars have used the above-mentioned deep learning techniques to achieve automated detection of bridge defects. Xu et al. [15] constructed an end-to-end and embedded ASPP structure convolutional neural network to achieve object detection of bridge cracks, and the F1 score of the model reached 87.71%. Shim et al. [16] generated multiscale feature maps in the encoder of a traditional semantic segmentation network and used adversarial learning to complete the semantic segmentation of bridge cracks, achieving an f1 score of 88.936%. Liu et al. [17] proposed APLCNet, based on Mask RCNN, for sidewalk crack detection, with an F1 score of 93.53% on a CFD dataset [18]. Ren et al. [19] proposed CrackSeg for segmentation of sidewalk cracks. In the same year, Jang et al. [20] improved CrackSeg and applied it to the detection of bridge pier cracks; its accuracy for crack segmentation reached 90.92% and its recall rate reached 97.47%. Liu et al. also improved CrackSeg, using the ResNeXt [21] backbone network, and proposed PCSN for bridge crack segmentation. The accuracy reached 83%. In addition, research on the interpretability of deep learning in bridge defect detection has also made new progress. Cardellicchio et al. [22] proposed DL-based methods for identifying defects in reinforced concrete bridges, offering humanly explainable interpretations using XAI techniques such as CAMs. In terms of datasets, Zou et al. [23] published a high-quality crack dataset called DeepCrack Datasets, and proposed DeepCrack, an end-to-end deep convolutional neural network for automatic crack detection. DeepCrack outperforms state-of-the-art methods, achieving an average F-measure of over 0.87 on three challenging datasets. Yang et al. [24] made a pavement crack dataset, named CRACK500, with 500 images. And Yang et al. proposed FPHBN, a novel network architecture for automatic pavement crack detection. In terms of crack geometric parameter extraction, Tang et al. [25] developed a novel method to detect and quantify cracks with high accuracy and efficiency. Their method uses a U-net network and a thinning algorithm to extract the geometric parameters of cracks, which contributes to crack detection and geometric parameter extraction. Kao et al. [26] first used YOLO to locate the bridge cracks, and then used a digital image method for edge detection in the locating area to achieve the measurement of crack dimension. Teng et al. [27] used DeepLab to perform sematic segmentation on crack images, which can better describe the key dimension information, such as the length and width of the cracks.
Although the algorithms and datasets for crack segmentation have been initially effective in bridge crack detection, they are mainly limited in the semantic segmentation of cracks, which cannot satisfy the need to distinguish different cracks. As mentioned above, although the improvement of CrackSeg substantially improves the segmentation accuracy of bridge cracks, its semantic segmentation characteristics cannot satisfy the localization of the intersecting longitudinal cracks which mainly affect bridge safety. Liu et al. proposed APLCNet [17] on the basis of Mask RCNN; although it completes the segmentation and localization of cracks, its dataset rarely involves complex intersecting cracks. Therefore, further research is urgently needed for instance segmentation of complex cracks with interlaced vertical and horizontal dimensions, in terms of both data and algorithms. The CRACK500 [24] and DeepCrack [23] datasets are both datasets used for semantic segmentation. In contrast, there is a relative lack of research on crack instance segmentation, especially regarding datasets. Though Piyathilaka et al. [28] annotated 500 images from other datasets to create a dataset for crack instance segmentation, it did not mention the criteria used to distinguish different cracks at the junction intersection. As can be seen from the published dataset, it simply labels the cracks around the intersection points as distinct objects, which does not satisfy the factual situation. Therefore, the instance segmentation of complex cracks that cross longitudinally and horizontally has yet to be studied in depth.
Based on the above research, we can make the following conclusions: the main difficulties in the adoption of bridge cracks include both data and algorithms. The difficulty at the data level lies in the lack of calibration methods used for describing crack morphology. Due to the characteristics of complex cracks intersecting longitudinally and horizontally, a simple closed graph cannot be used to describe its morphology. Therefore, a new technical standard for crack annotation is urgently needed. The difficulty at the algorithm level lies in the absence of targeted optimization for cracks in the instance segmentation. At the same time, the data and algorithm are mutually influential. The crack-counting standard not only affects the difficulty and accuracy of the crack segmentation dataset, but also influences the post-processing of previous output and management decisions. The lack of an optimized instance segmentation algorithm is contrary to the requirements of subsequent evaluation, especially for the detection of cracks with long and thin shapes. Breakpoints in cracks are one of the embodiments of an unoptimized instance segmentation model, resulting in counting errors and wrong dimension-estimation.
To this end, this paper uses cameras and UAVs to collect high-resolution images of concrete bridge cracks. A criterion for calibration and the counting of concrete cracks under complex conditions, called relative breakpoint annotation, is proposed. The proposed criterion is used to create a high-quality segmentation of concrete bridge crack examples using Photoshop, LabelMe and Eiseg. The dataset was manually calibrated with information on the true size of the crack targets in the dataset. In order to verify the improvement of crack recognition accuracy with this dataset format, several instance-segmentation algorithms with different design architectures (Mask RCNN and Yolact) were trained and tested.
The proposed method aims to propose a normalized crack counting rule and boundary determination rule to standardize the annotation of a crack instance segmentation dataset. Based on the proposed method, the proposed dataset will contain 787 images, including 3794 crack objects, which will greatly improve the accuracy of crack instance segmentation detection at the dataset level. Based on the proposed dataset production method, a set of lightweight crack-recognition methods is implemented according to the proposed dataset production method.
This paper begins by introducing the two classic deep learning algorithms in Section 2, including Mask RCNN and Yolact, which are employed to validate the proposed crack annotation method. Following this, Section 3 presents the proposed crack annotation method and a comprehensive analysis in detail. The evaluation work is conducted in Section 4, followed by a section of concluding remarks.

2. Crack Segmentation Algorithm

Since the method proposed in this paper is used to make an instance segmentation dataset of bridge cracks, this section introduces the two instance segmentation models used in the verification method. To verify the universality of the crack-counting criteria and dataset format proposed in this paper, two classic instance segmentation models are utilized, i.e., Mask R-CNN [11] and Yolact [12]. Mask R-CNN performs object detection, first by generating a region of interest using the region proposal network, and then segmenting the objects using the segmentation branch. In a different manner, Yolact performs detection and segmentation simultaneously by predicting instance masks for each pixel in the image by using the Yolact layer. A “mask” refers to a pixel-level binary representation that identifies the exact location and shape of each individual object instance within an image. The robustness and applicability of the proposed relative-breakpoint-based annotation method in terms of correction and efficiency can be verified by using these two models; therefore, we first introduce both models briefly.

2.1. MaskRCNN

Mask R-CNN was first proposed by He et al. in 2017 [11], and has been widely used for crack detection. Tan et al. demonstrated that detectors based on Mask R-CNN provide a robust and feasible ability for detecting, in real time, cracks’ existence and their shapes [12]. Pan et al. demonstrated the satisfactory performance achieved by using Mask RCNN for scratch detection in architectural glass panels [29]. As mentioned above, it is a typical two-stage instance segmentation model, capable of localization and segmentation of detection objects at the same time. The overall flow of Mask RCNN is shown in Figure 1.
Specifically, a three-channel RGB input image is first processed through a backbone network using ResNet50 [30,31] for sematic feature extraction.
The multilevel feature layers obtained in the first step are then fed into the FPN [11,32] network for feature fusion. The FPN has been widely used for object detection, on multiple scales, in various segmentation models. For details, see [31]. For the network output of each stage block of ResNet50, the features of the adjacent feature layers are fused. After several operations of convolution and pooling, the scale of the high-level feature layer becomes smaller, and the accuracy of the object location decreases, but its semantic information gets richer. In contrast, the bottom feature layer extracts less semantic information, while the location is more accurate. The FPN structure fuses the semantic information of the upper layer and lower layer, causing the model to achieve a substantial improvement in detecting multi-scale objects.
After that, a region proposal network, termed RPN, is employed to generate proposals, which are subsequently corrected and filtered to obtain final Mask R-CNN output of the rectangular box for object localization. The RPN was first proposed in Faster RCNN and continuedly adopted in the Mask RCNN structure [7,11]. RPN is a typical symbol of two-staged instance segmentation.
Since the proposals generated by RPN have different shapes and sizes, but the subsequent network requires a fixed-size feature map as the inputs, ROIAlign is then used to resize the proposals to a fixed dimension [11]. The feature-map size of the prediction branch of the mask branch is 14 × 14, and the size is 28 × 28, considering the higher precision required for segmentation. Finally, the feature maps generated by ROIAlign are put into the prediction branch and the new mask branch to obtain the coordinates of local rectangles and masks, respectively.

2.2. Yolact

You Only Look at CoefficientTs, termed Yolact, is a real-time instance-segmentation model proposed in 2019 that can perform detection and segmentation at the same time, and whose procedure can be found in Figure 2. Yolact has been proved to achieve excellent results on popular benchmarks, and has been widely used in applications such as object recognition and autonomous driving. For instance, Wang et al. [33] utilized Yolact for quick and accurate detection and segmentation of the drivable area in intelligent driving. Zou et al. applied Yolact for ancient architecture segmentation [34].
As shown in Figure 2 and as compared with Figure 1, Yolact is consistent with Mask R-CNN in terms of feature extraction and feature fusion in FPN. But instead of using RPN, Yolact performs both localization and segmentation with a parallel structure. Specifically, information generated from the prediction head, which is the first branch of Yolact, includes the location parameters of the localization rectangle, the object confidence parameters and a matrix of weight parameters named mask coefficients. The protonet branch splits the model semantics to obtain the mask prototypes. The parameters in the mask coefficient matrix are used as the weights of each mask prototype, and preliminary masks are obtained by summing up the prototypes by their weights. Utilizing the localization box generated by the prediction branch, the mask output is cropped, and then the final mask and localization box are obtained after binarization by threshold.

3. Breakpoint-Based Crack Annotation

This section presents the proposed normalized crack-boundary demarcation and technical criteria, which will be used to produce a standard crack instance segmentation dataset; this method focuses on concrete bridge cracks. We refer to the proposed method as relative-breakpoint crack annotation, and compare this proposed method with two previously abandoned methods. The proposed method standardizes the production of crack instance segmentation datasets, which contributes to the accuracy improvement of crack instance segmentation models from the data level.
The counting criteria for cracks has a great influence on the model’s performance. The most difficult problem of counting and calibration is how to judge the boundary of multiple intersecting cracks. Taking Figure 3a as an example, different crack annotation methods, as indicated by Figure 3b–d, can seriously affect the crack-detection model’s training and performance. Therefore, we make a comparative analysis between continuous crack annotation methods, i.e., Figure 3b, and the absolute breakpoint annotation in the following subsections, i.e., Figure 3c. Based on the results of the analysis, a relative-breakpoint-based annotation strategy, as seen in Figure 3d, is proposed to delineate crack boundaries to create a high-quality crack instance segmentation dataset.

3.1. Continuous Crack Annotation

Continuous annotation is shown in Figure 3b, in which all connected cracks in the entirety of the image are considered as a whole and marked as a single object. The advantage of this labelling method is that the standard boundary is simple, and there is no ambiguity and difficulty in determining it.
However, the disadvantages of this approach are obvious. For anchor-based instance-segmentation models (i.e., candidate windows, final local rectangles obtained through anchor adjustment, filtering, etc.), this approach largely depends on the accuracy of the model’s ability to detect and segment large objects. Therefore, this method somewhat renders the detection and segmentation of small objects meaningless. The model uses different feature layers and corresponding anchor points for crack detection and segmentation. The feature layers of different scales are used to detect objects of different scales. As a result, only the proposals and anchor points generated by the large-scale prediction feature layer are useful in back-propagation during model training. By contrast, the proposals generated by other scale-prediction feature layers are classified as negative. This leads to the updating of the weights of each network structure incorrectly during back-propagation because they do contain cracked objects inside. The weights are updated in the wrong direction.

3.2. Absolute Breakpoint Crack Annotation

Absolute breakpoint annotations are defined as broken objects that break at any intersection point. An example of absolute breakpoint annotation is shown in Figure 3c. Compared with the continuous crack labelling rule, this labelling criterion is conceptually inverse and more sensitive to small objects. The advantage of absolute breakpoint labelling is that the boundary is clear, and there is no situation where the boundary is blurred or difficult to determine.
Similar to continuous annotations, the disadvantages of absolute breakpoint annotations include potentially reducing the performance of the model in detecting and segmenting large objects. But according to our experience and analysis, its disadvantages may not be as obvious as those of continuous labelling. Continuous breakpoint labelling increases the proportion of the proposal rectangle, but the increase in the mask size is much smaller than the increase in the proposal proportion. Since the width of the crack is very small relative to the length, the crack can be approximated as a curve. Compared to absolute breakpoint annotations, the area of proposal rectangles for continuous annotations grows at a quadratic rate, while the area of masks grows at an approximately linear rate. Therefore, for absolute breakpoint annotations, it is suggested that the effective pixel ratio of cracks in rectangles is higher than that of continuous annotations, which facilitates segmentation.
For most instance-segmentation algorithms, segmentation is performed based on the detection results. Generally speaking, the algorithm first completes the task of target detection, and then performs segmentation within the detected target range (i.e., bbox rectangle). Both segmentation and object detection backpropagation will change the weight parameters of each structure of the whole model. Therefore, it is necessary to ensure that the proportion of effective pixels inside the rectangular frame is as high as possible.

3.3. Relative Breakpoint Crack Annotation

According to our analysis, the above two methods do not conform, to a certain extent, to objective laws, but the standards are clear and unambiguous. Combining the advantages of the above two labelling standards, this paper proposes a relative breakpoint labelling method, as indicated in Figure 3d and Figure 4.
Relative breakpoint labels still distinguish the boundary of the crack object at the breakpoint, but the breakpoint here is a relative breakpoint and is specific to each crack. The rule is defined as follows:
Firstly, there is an order-of-magnitude difference in crack width around the intersection point, i.e., the absolute breakpoint. For the wider crack, the absolute breaking point is not the relative breaking point of the crack, and the crack will not break here to form two cracks. On the contrary, the smaller crack will form a relative breakpoint here and break at this relative breakpoint.
Secondly, if the crack widths around the intersection point (i.e., the absolute breaking point) are similar and cannot be distinguished by the first criterion, the crack is judged by whether the line is continuous. For continuous line cracks, absolute breakpoints do not constitute relative breakpoints. However, for cracks with discontinuous lines or sharp changes before and after the absolute breakpoint, a relative breakpoint is formed. In short, cracks with discontinuous lines or sharp changes will break at relative breakpoints.
According to the above two rules, we can judge most of the cracks that cross each other. For example, the red crack shown in Figure 4 is relatively smaller than the blue crack, so a relative crack is formed at the intersection point, and the red crack breaks at the relative crack point. The blue cracks are much wider, so there is no relative crack formed at the intersection. At the same time, this case also satisfies the second judgment basis, where the lines of the blue cracks at the intersections are more continuous than the lines of the red cracks.
As a result, the red crack should also be broken at the intersection, while the blue crack is kept in a continuous line.
Theoretically, the continuity of a line means the curvature of the line near the breakpoint. The greater the curvature, the greater the probability that the two cracks before and after the breakpoint are different. The advantage of using relative breakpoint annotation is that the distinction of cracks is more in line with mechanics and human logic. In addition, relative breakpoint annotation makes objects of different scales in the dataset more balanced. Therefore, the predicted feature layers of each scale in the model and their generated proposals and anchors can be fully utilized to enhance the robustness of the model.

3.4. Annotation Method Comparative Analysis

Based on previous analysis, we summarize the features of the aforementioned three annotation methods in Table 1, including the subjective consciousness effect, mechanical rationality and model training efficiency. Since the sample labelling is a human-interacting task, primary judgement should be conducted based on the visual subjective perception of a human. Therefore, an indicator named “human subjective intervention” is proposed to assess the interference of human consciousness in the process of creating samples for model training. Secondly, cracks are the manifestation of structural components under various loading conditions; hence, the labeled crack objects should conform to its mechanical rationality. For instance, two cracks that intersect should not be regarded as one crack because it does not conform to the mechanical analysis. Therefore, the second index proposed and compared is the mechanical rationality of three annotation methods. Thirdly, computation cost is taken into consideration by judging whether the annotation methods are conducive to model training, which is presented as the model training efficiency.
In addition, since the proposed method focuses on the cracks of the concrete bridge beam in the actual engineering detection, the analysis of the three methods will not consider factors other than the actual detection process, and the qualitative analysis is carried out on the normal collected images.
Comparative results are summarized in Table 1, according to our previous discussion based on the crack features and deep learning model structures. Generally, the continuous crack annotation should be the most objective method, with minimum human intervention effect; however, the high computation cost is expected during the model training with those continuously annotated crack samples. As for the relative breakpoint annotation, it is subjectively influenced by humans, and its annotation results are consistent with both human subjective perception and mechanical analysis results. It also provides relatively balanced feature maps of different scales, which should be conducive to model training convergence. To verify our conjectures, an experiment is then conducted to verify the specific performance in model training of these three annotation methods, as described in Section 4.

4. Lightweight Crack Detection Model

As mentioned previously, this section will be first to verify the performance of proposed relative-breakpoint-based crack annotation methods. The dataset contains 120 images, and 744 cracks are prepared for crack annotation using the three different mentioned methods, i.e., continuous crack annotation, absolute-breakpoint crack annotation, and relative-breakpoint crack annotation. The Mask RCNN is utilized for model training, and the mAP of bbox is used as an indicator to judge the best annotation method. The mAP of mask is used as a more precise indicator to measure the model after choosing the method. After that, the outperformed crack annotation method is then used to prepare another larger dataset, which contains 787 images for crack detection model training and evaluation, followed by the post-processing procedures to generate the dimensional information of detected cracks. The 787 images in the dataset contain 3794 crack objects, so each image contains more cracks and has high complexity, as the example picture shown in Figure 5 demonstrates. Thus, the dataset has sufficient complexity to complete the validation of the proposed method. Mask RCNN and Yolact are employed to further validate the accuracy of our method. The reason for selecting these two models is that they represent the two-stage instance segmentation and one-stage instance segmentation approaches, respectively. Validating our proposed method using these two models demonstrates its versatility. The overview procedures can be found in Figure 5.

4.1. Crack-Annotation Method Evaluation

First, a small trial dataset containing 120 images is prepared to assess these three crack annotation methods, referred as the trial set, as seen in Figure 6.
Using Mask RCNN for crack detection: The training tests were conducted using an Ubuntu 20.04 operating system, an Intel (R) Xeon (R) W-2133 CPU @ 3.60 GHz, Python as the programming language and 3.7 as the interpreter version. An NVIDIA GeForceRTX2080Ti GPU was used as the accelerator for the model training tests. Model training and testing were accelerated using CUDA and Cudnn. The deep learning framework of choice was Pytorch.
To evaluate the models’ performance, precision and recall are basic indicators, which are calculated as True Positive (TP), False Positive (FP), False Negative (FN) and True Negative (TN), as seen in Equations (1) and (2).
Precison = T P T P + F P
R e c a l l = T P T P + F N
where TP indicates the number of detected objects that are correctly classified, while FP denotes the number of backgrounds in the image that are misidentified as detected objects. Additionally, FN denotes the number of detected objects in the image that are misclassified as backgrounds, and TN denotes the number of backgrounds in the image that are correctly classified as backgrounds.
Average precision, termed AP, is a measure of how well an object detection algorithm performs. To assess the model’s performance, mean average precision (mAP), which means the average precision of different categories, is also taken into consideration. Similarly, average recall, named AR, is a performance metric in object detection that measures the average ratio of correctly detected objects compared to the total number of objects across multiple recall thresholds. In general, the AR, AP and mAP are performance metrics that can be used to evaluate the accuracy of object detection algorithms.
The results are summarized in Table 2. As can be seen, the results show that relative-breakpoint crack annotation suits the model best and performs best, as indicated by the highest AP or AR values. For instance, in addition to the indicator AP:0.5–0.95, Rel-break has the highest value of all other indicators. Overall, Rel-break has an AP value of 28.4, much higher than Cont’s 12.5 and Abs-break’s 18. In addition, Rel-break has a large advantage over AP and AR for large-scale targets, with a value of 55.8. On the other hand, the AP of the Cont method for large-scale targets is very low, only 19, which indicates that slender and large-scale labels are not suitable for model training. Additionally, Abs-break is slightly better than Rel-break in the index of small-scale objects. This might be due to the fact that, if there are intersection points in small-scale objects, boundary distinction is difficult to carry out because the scale is too small. Thus, Rel-break does not show an advantage in distinguishing different cracks at the intersection points.
Therefore, the outperformance of the proposed relative breakpoint annotation can be demonstrated, as previously analyzed in Table 1. It was chosen to label more images in order to examine further its feasibilities.

4.2. Dataset Preparation for the Lightweight Model

The relative-larger dataset contains 787 images of cracks with different scales, surroundings, quantities and morphologies.
Instance-segmentation datasets such as COCO use polygon vertexes to describe a line. The problem is that manual calibration cannot reach a number of vertices sufficient to describe a line smoothly with sufficient precision. In order to ensure the smoothness of the crack line, we utilize the lasso tool of Photoshop to annotate the cracks. For instance, as shown in Figure 7a, Photoshop is capable of describing detailed information about the edges of the crack. The labeled information from Photoshop is saved as a png image of a binary graph, as shown in Figure 7b. After that, the png is converted into a json file in COCO dataset format using the digital image gradient method. The process is shown in Figure 7c,d. Since the COCO format uses polygons to approximate the shape contour of the object, the annotation of the map cracking is difficult. As a result, the map cracking is eliminated from the dataset and only the non-map-cracking images are kept. The trained model is expected to characterize the unseen map cracking by learning features from those multiple intersecting single cracks.

4.3. Lightweight Model Training and Validation

Image-enhancing approaches, i.e., flip, rotation, random clipping, Gaussian fuzzy, etc., are applied on the training dataset. The pre-trained weights of the models on the COCO [35] dataset are transferred to our task. By transferring the pre-trained weights on the COCO dataset, the backbone network only needs fine-tuning, which greatly reduces the requirement for dataset volume in training the rest of the model. In Mask RCNN or Yolact, the anchor is a predefined bounding box of different scales and aspect ratios that are placed at regular intervals on the image to enable the network to detect objects of varying sizes and shapes. Depending on the statistics of the dataset, setting the anchor size to 50, 100, 200, 400 and 800 will cover most of the annotated instances. The length-to-width ratios of anchors are amended to 1/4, 1 and 4, respectively. In order to emphasize the importance of crack alignment, the proportion of the five components among the original loss function is defined as r p n c l a s s l o s s : r p n b b o x l o s s : c l a s s l o s s : b b o x l o s s : m a s k l o s s = 1 : 1 : 1 : 1 : 3 .
As given in Table 3, we first evaluated the performance of the model with different intersection exceedance associations with thresholds between 0.5 and 0.95 and intervals of 0.05. Results obtained with a 0.5 threshold are recorded as AP50. Similarly, AP75 represents the average precision when the threshold of IOU corresponds to 0.75. The intersection over union (IOU) is a metric used to evaluate the performance of object detection and image segmentation tasks [36]. IOU is calculated by dividing the area of intersection between the predicted and ground truth regions by the area of their union. The formula for IOU is as follows:
IOU = Intersection Area Union Area
It can be seen from Table 3 that the dataset annotated by relative breakpoint can make the model converge and meet the requirements of industry, for both Mask RCNN and Yolact. Several detection results are shown in Figure 8, which can indicate that the model is able to distinguish the intersecting cracks well and count them effectively. Taking Figure 8c as an example, crack0 intersects crack4. Since crack0 has a more continuous line shape and a larger width, the front and rear of crack0 at the intersection point are considered as a whole, while crack4 is disconnected at the intersection point. Additionally, the model can satisfy the detection requirements of cracks with different lengths, widths and aspect ratios at the same time. For instance, the cracks shown in Figure 8c,h have complex intersections and contain different crack scales. In this case, the model can still achieve localization and segmentation well. Moreover, Figure 8e shows that the model can perform well within weak lighting conditions.
Therefore, the results can prove that when using the proposed relative breakpoint labelling method, only 700 samples are needed, and the model can be trained with satisfactory performance, whether it is Mask RCNN or Yolact.

5. Concluding Remarks

This paper makes an effort to address the challenge of limited datasets in the instance of segmentation models in the field of bridge crack detection. Considering the challenge and impracticality of building a large dataset composed entirely of multiple cracks, we propose a relative-breakpoint crack annotation method to support the lightweight crack detection model, which requires a limit on the training samples in order to improve the model’s feasibility and generalization. Several highlights of the findings can be drawn as follows:
A lightweight bridge-crack identification solution is proposed for bridge routine inspection and maintenance, especially for bridges with poor accessibility. The proposed method mainly contains a crack annotation and crack detecting model training procedure.
In-depth analysis has been conducted on how the crack annotation form effects the pre-trained model behavior. The proposed relative-breakpoint-based annotation strategy can improve the performance of an instance segmentation model, especially under the complex interlaced-crack scenes; it can effectively distinguish such crisscrossed cracks. Hence, this method can enable various instance-segmentation models to effectively distinguish multiple cracks at intersections, as well as improve the efficiency and accuracy of subsequent analysis.
Classic Mask RCNN and Yolact were both employed to establish crack instance segmentation models to carry out the crack detection, localization and segmentation. With the complex validation set, the model localization accuracy and recall rates can reach 88.4% and 89.2%, respectively. The results demonstrate the effectiveness of the proposed breakpoint-based annotation in distinguishing complex interlaced cracks for both crack instance segmentation models.
In summary, the proposed relative-breakpoint-based annotation has the advantages of an ability to distinguish between cracks of different scales and a friendliness for model convergence. In general, our method normalizes the format of the crack instance segmentation dataset and strengthens the accuracy and generalizability of the instance segmentation model from the data level. With the standardized and improved labelling rules, the crack identification model can be trained to its intended ability with fewer samples.
To further verify the robustness and versatility of our method in detecting cracks in concrete structures, validation of the method is planned, specifically, by applying it to specific concrete bridges, evaluating its generalization ability through testing on various types of crack conditions and exploring ways to enhance its accuracy.

Author Contributions

Conceptualization, W.X. (Weidong Xu); Funding acquisition, W.X. (Wen Xiong); Investigation, W.X. (Weidong Xu); Methodology, Y.Z. and W.X. (Weidong Xu); Project administration, Y.Z. and W.X. (Wen Xiong); Software, Y.Z. and W.X. (Weidong Xu); Supervision, Y.Z. and C.S.C.; Validation, Y.Z.; Writing—original draft, W.X. (Weidong Xu); Writing—review and editing, Y.Z. and C.S.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Projects 52022021, 51978160, 52108118), the Key Laboratory of Roads and Railway Engineering Safety Control (Shijiazhuang Tiedao University), the Ministry of Education (STDTKF202101), and the Key Research and Development Program of Jiangsu Province of China (Project BE2021089).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Dong, C.-Z.; Catbas, F.N. A Review of Computer Vision–Based Structural Health Monitoring at Local and Global Levels. Struct. Health Monit. 2021, 20, 692–743. [Google Scholar] [CrossRef]
  2. Ibrahim, A.; Eltawil, A.; Na, Y.; El-Tawil, S. A Machine Learning Approach for Structural Health Monitoring Using Noisy Data Sets. IEEE Trans. Automat. Sci. Eng. 2020, 17, 900–908. [Google Scholar] [CrossRef]
  3. Hou, S.; Dong, B.; Wang, H.; Wu, G. Inspection of Surface Defects on Stay Cables Using a Robot and Transfer Learning. Autom. Constr. 2020, 119, 103382. [Google Scholar] [CrossRef]
  4. Yeum, C.M.; Choi, J.; Dyke, S.J. Automated Region-of-Interest Localization and Classification for Vision-Based Visual Assessment of Civil Infrastructure. Struct. Health Monit. 2019, 18, 675–689. [Google Scholar] [CrossRef]
  5. Narazaki, Y.; Hoskere, V.; Hoang, T.A.; Spencer, B.F., Jr. Automated Bridge Component Recognition Using Video Data. arXiv 2018, arXiv:1806.06820. [Google Scholar]
  6. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection; IEEE Computer Society: Piscataway, NJ, USA, 2016; pp. 779–788. [Google Scholar]
  7. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NeurIPS Proceedings: Advances in Neural Information Processing Systems 28 (NIPS 2015); Curran Associates, Inc.: Red Hook, NY, USA, 2015; Volume 28. [Google Scholar]
  8. Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
  9. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
  10. Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  11. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
  12. Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. YOLACT: Real-Time Instance Segmentation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9156–9165. [Google Scholar]
  13. O’Shea, K.; Nash, R. An Introduction to Convolutional Neural Networks. arXiv 2015, arXiv:1511.08458. [Google Scholar]
  14. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In NeurIPS Proceedings: Advances in Neural Information Processing Systems 30 (NIPS 2017); Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
  15. Xu, Y.; Wang, Y.; Yuan, J.; Cheng, Q.; Wang, X.; Carson, P.L. Medical Breast Ultrasound Image Segmentation by Machine Learning. Ultrasonics 2019, 91, 1–9. [Google Scholar] [CrossRef]
  16. Fukushima, K. Neocognitron: A Hierarchical Neural Network Capable of Visual Pattern Recognition. Neural Netw. 1988, 1, 119–130. [Google Scholar] [CrossRef]
  17. Liu, Z.; Cao, Y.; Wang, Y.; Wang, W. Computer Vision-Based Concrete Crack Detection Using U-Net Fully Convolutional Networks. Autom. Constr. 2019, 104, 129–139. [Google Scholar] [CrossRef]
  18. Shi, Y.; Cui, L.; Qi, Z.; Meng, F.; Chen, Z. Automatic Road Crack Detection Using Random Structured Forests. IEEE Trans. Intell. Transp. Syst. 2016, 17, 3434–3445. [Google Scholar] [CrossRef]
  19. Ren, Y.; Huang, J.; Hong, Z.; Lu, W.; Yin, J.; Zou, L.; Shen, X. Image-Based Concrete Crack Detection in Tunnels Using Deep Fully Convolutional Networks. Constr. Build. Mater. 2020, 234, 117367. [Google Scholar] [CrossRef]
  20. Jang, K.; An, Y.-K.; Kim, B.; Cho, S. Automated Crack Evaluation of a High-Rise Bridge Pier Using a Ring-Type Climbing Robot. Comput.-Aided Civ. Infrastruct. Eng. 2021, 36, 14–29. [Google Scholar] [CrossRef]
  21. Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5987–5995. [Google Scholar]
  22. Cardellicchio, A.; Ruggieri, S.; Nettis, A.; Renò, V.; Uva, G. Physical Interpretation of Machine Learning-Based Recognition of Defects for the Risk Management of Existing Bridge Heritage. Eng. Fail. Anal. 2023, 149, 107237. [Google Scholar] [CrossRef]
  23. Zou, Q.; Zhang, Z.; Li, Q.; Qi, X.; Wang, Q.; Wang, S. DeepCrack: Learning Hierarchical Convolutional Features for Crack Detection. IEEE Trans. Image Process. 2019, 28, 1498–1512. [Google Scholar] [CrossRef] [PubMed]
  24. Yang, F.; Zhang, L.; Yu, S.; Prokhorov, D.; Mei, X.; Ling, H. Feature Pyramid and Hierarchical Boosting Network for Pavement Crack Detection. IEEE Trans. Intell. Transp. Syst. 2020, 21, 1525–1535. [Google Scholar] [CrossRef] [Green Version]
  25. Tang, Y.; Huang, Z.; Chen, Z.; Chen, M.; Zhou, H.; Zhang, H.; Sun, J. Novel Visual Crack Width Measurement Based on Backbone Double-Scale Features for Improved Detection Automation. Eng. Struct. 2023, 274, 115158. [Google Scholar] [CrossRef]
  26. Kao, S.-P.; Chang, Y.-C.; Wang, F.-L. Combining the YOLOv4 Deep Learning Model with UAV Imagery Processing Technology in the Extraction and Quantization of Cracks in Bridges. Sensors 2023, 23, 2572. [Google Scholar] [CrossRef] [PubMed]
  27. Teng, S.; Chen, G. Deep Convolution Neural Network-Based Crack Feature Extraction, Detection and Quantification. J. Fail. Anal. Prev. 2022, 22, 1308–1321. [Google Scholar] [CrossRef]
  28. Piyathilaka, L.; Preethichandra, D.M.G.; Izhar, U.; Kahandawa, G. Real-Time Concrete Crack Detection and Instance Segmentation Using Deep Transfer Learning. Eng. Proc. 2020, 2, 91. [Google Scholar]
  29. Pan, Z.; Yang, J.; Wang, X.; Wang, F.; Azim, I.; Wang, C. Image-Based Surface Scratch Detection on Architectural Glass Panels Using Deep Learning Approach. Constr. Build. Mater. 2021, 282, 122717. [Google Scholar] [CrossRef]
  30. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  31. Guo, F.; Qian, Y.; Wu, Y.; Leng, Z.; Yu, H. Automatic Railroad Track Components Inspection Using Real-Time Instance Segmentation. Comput. -Aided Civ. Infrastruct. Eng. 2021, 36, 362–377. [Google Scholar] [CrossRef]
  32. Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
  33. Wang, G.; Zhang, B.; Wang, H.; Xu, L.; Li, Y.; Liu, Z. Detection of the Drivable Area on High-Speed Road via YOLACT. SIViP 2022, 16, 1623–1630. [Google Scholar] [CrossRef]
  34. Zou, Z.; Zhao, P.; Zhao, X. Automatic Segmentation, Inpainting, and Classification of Defective Patterns on Ancient Architecture Using Multiple Deep Learning Algorithms. Struct. Control Health Monit. 2021, 28, e2742. [Google Scholar] [CrossRef]
  35. Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
  36. Rahman, M.A.; Wang, Y. Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation. In Proceedings of the Advances in Visual Computing: 12th International Symposium, ISVC 2016, Las Vegas, NV, USA, 12–14 December 2016; Bebis, G., Boyle, R., Parvin, B., Koracin, D., Porikli, F., Skaff, S., Entezari, A., Min, J., Iwai, D., Sadagic, A., et al., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 234–244. [Google Scholar]
Figure 1. Overall procedure of Mask RCNN.
Figure 1. Overall procedure of Mask RCNN.
Applsci 13 08950 g001
Figure 2. Overall procedure of Yolact.
Figure 2. Overall procedure of Yolact.
Applsci 13 08950 g002
Figure 3. Description of three crack annotation methods.
Figure 3. Description of three crack annotation methods.
Applsci 13 08950 g003
Figure 4. Description of relative breakpoint annotation.
Figure 4. Description of relative breakpoint annotation.
Applsci 13 08950 g004
Figure 5. Overview of crack-detection model training and model evaluation.
Figure 5. Overview of crack-detection model training and model evaluation.
Applsci 13 08950 g005
Figure 6. Part of the trial datasets, using different crack annotation methods.
Figure 6. Part of the trial datasets, using different crack annotation methods.
Applsci 13 08950 g006
Figure 7. Crack labelling and sample preparation. (a) Labelling edges; (b) Saving annotation information as a binary graph; (c) Extracting edges using threshold method; (d) Visualization results of COCO format datasets using LabelMe.
Figure 7. Crack labelling and sample preparation. (a) Labelling edges; (b) Saving annotation information as a binary graph; (c) Extracting edges using threshold method; (d) Visualization results of COCO format datasets using LabelMe.
Applsci 13 08950 g007
Figure 8. Crack detection results.
Figure 8. Crack detection results.
Applsci 13 08950 g008
Table 1. Assessment of three crack annotation methods.
Table 1. Assessment of three crack annotation methods.
Annotation MethodsHuman Subjective InterventionMechanical RationalityModel Training Efficiency
Continuous crack annotationMinimum/Inefficient
Absolute breakpoint crack annotationMediumNot consideredGenerally efficient
Relative breakpoint crack annotationMaximumConsideredHighly efficient
Table 2. Comparison of three crack annotation methods.
Table 2. Comparison of three crack annotation methods.
IndexIOUAreaContABS-BREAKREL-BREAK
AP0.5:0.95all12.51828.4
AP500.5all30.94067.1
AP750.75all7.112.918.5
AP0.5:0.95small9.213.913.7
AP0.5:0.95medium26.83035.2
AP0.5:0.95large1921.655.8
AR0.5:0.95all3427.937.9
AR0.5:0.95small20.117.725.4
AR0.5:0.95medium42.633.344.7
AR0.5:0.95large50.844.456.8
Table 3. Crack-detection results using mask RCNN and YOLACT.
Table 3. Crack-detection results using mask RCNN and YOLACT.
IndexIOUArea *Mask RCNNYolact
BboxMaskBBoxMask
AP0.5:0.95all46.514.730.121.1
AP500.5all64.634.960.148.2
AP750.75all50.611.430.413.9
AP0.5:0.95small32.76.822.36.5
AP0.5:0.95medium61.819.845.830.9
AP0.5:0.95large59.524.217.438.5
AR0.5:0.95all52.522.238.424.8
AR0.5:0.95small33.814.824.39.9
AR0.5:0.95medium65.425.548.834
AR0.5:0.95large69.229.246.245.4
Precision
(IOU = 0.5, threshold = 0.5)
76.989.288.4263
* Area: the size or scale of the object instances within an image, including small, medium, and large.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhu, Y.; Xu, W.; Cai, C.S.; Xiong, W. Relative-Breakpoint-Based Crack Annotation Method for Lightweight Crack Identification Using Deep Learning Methods. Appl. Sci. 2023, 13, 8950. https://doi.org/10.3390/app13158950

AMA Style

Zhu Y, Xu W, Cai CS, Xiong W. Relative-Breakpoint-Based Crack Annotation Method for Lightweight Crack Identification Using Deep Learning Methods. Applied Sciences. 2023; 13(15):8950. https://doi.org/10.3390/app13158950

Chicago/Turabian Style

Zhu, Yanjie, Weidong Xu, C. S. Cai, and Wen Xiong. 2023. "Relative-Breakpoint-Based Crack Annotation Method for Lightweight Crack Identification Using Deep Learning Methods" Applied Sciences 13, no. 15: 8950. https://doi.org/10.3390/app13158950

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop