A Highway Pavement Crack Identification Method Based on an Improved U-Net Model

Wu, Qinge; Song, Zhichao; Chen, Hu; Lu, Yingbo; Zhou, Lintao

doi:10.3390/app13127227

Open AccessArticle

A Highway Pavement Crack Identification Method Based on an Improved U-Net Model

School of Electrical and Information Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(12), 7227; https://doi.org/10.3390/app13127227

Submission received: 9 May 2023 / Revised: 5 June 2023 / Accepted: 15 June 2023 / Published: 16 June 2023

Download

Browse Figures

Versions Notes

Abstract

:

Crack identification plays a vital role in preventive maintenance strategies during highway pavement maintenance. Therefore, accurate identification of cracks in highway pavement images is the key to highway maintenance work. In this paper, an improved U-Net network adopting multi-scale feature prediction fusion and the improved parallel attention module was put forward to better identify concrete cracks. Multiscale feature prediction fusion combines multiple U-Net features generated by intermediate layers for aggregated prediction, thus using global information from different scales. The improved parallel attention module is used to process the U-Net decoded output of multi-scale feature prediction fusion, which can give more weight to the target region in the image and further capture the global contextual information of the image to improve the recognition accuracy. Improving the bottleneck layer is used to improve the robustness of the model and prevent overfitting. Experiments show that the improved U-Net network in this paper has a significant improvement over the original U-Net network. The performance of the proposed method in this paper was investigated on two publicly available datasets (Crack500 and CFD) and compared with competing methods proposed in the literature. Using the Crack500 dataset, the method in this paper achieved the highest score in precision (89.60%), recall (95.83%), mIOU (83.80%), and F1-score (92.61%). Similarly, for the CFD dataset, the method in this paper achieved high values for precision (93.29%), mIOU (82.07%), recall (86.26%), and F1-score (89.64%). Thus, the method has several advantages for identifying cracks in highway pavements and is an ideal tool for practical work. In future work, identifying more crack types and model light-weighting are the key objectives. Meanwhile, this paper provides a new idea for road crack identification.

Keywords:

pavement crack; multi-scale feature prediction fusion; attention mechanism; U-Net; image segmentation

1. Introduction

China is a vast and populous country with a total population of 1.4 billion. At the same time, China’s land area has reached 9.6 million square kilometers [1]. Therefore, the development of the transportation industry is closely related to the development of the national economy. For this reason, the Chinese government has always considered the development of the transportation industry as the focus of economic construction and insisted on the development idea of “economic development, transportation first”. Although China’s road infrastructure work started late compared to that of developed countries, with the passage of time, the speed of road construction in China has rapidly increased, and the number of road miles has been rising. With the continuous expansion of the road network, road hazards are also increasing, and among the road hazards, pavement cracks [2] seriously threaten road safety. If the road pavement cracks are not identified in a timely manner to identify the level of damage and to maintain them, it will cause the damage to the road surface, the collapse of the road surface, the increase in traffic accidents, and other hazards that directly threaten the life and health of the nation. Therefore, it is necessary to identify pavement cracks in a timely manner.

Highway pavement crack identification has been an important issue in the field of transportation construction [3]. Traditional pavement crack identification methods require a large number of manual inspections, which are time-consuming and labor-intensive, and the results are not accurate and stable enough. In order to solve this problem, more and more researchers have started to explore the use of deep learning techniques for automated pavement crack recognition.

Before deep learning techniques were applied to pavement crack recognition, researchers usually used traditional image processing methods to identify cracks. Edge extraction operators [4,5,6] and threshold segmentation algorithms [7,8] are the representatives of traditional pavement crack recognition. Nnolim [9] proposed a crack feature segmentation algorithm based on adaptive threshold edge detection. However, in practical engineering, shadows, road marking lines, and stains on the road surface will affect the recognition accuracy of traditional methods. In addition to edge detection and threshold segmentation methods, there are crack recognition methods based on the wavelet transform. However, these methods have the disadvantages of high requirements for environmental factors and poor robustness [10,11,12].

In recent years, deep learning is increasingly used in the field of computer vision and is widely used in image processing, target detection, segmentation, and other fields due to its excellent feature learning capability and good classification effect. In pavement crack detection, deep learning can automatically learn feature representations in road images to effectively identify pavement cracks. Therefore, scholars at home and abroad have proposed many deep learning-based methods for pavement crack recognition, such as migration learning-based methods [13,14,15], machine learning [16,17,18], generative adversarial network (GAN)-based methods [19,20], convolutional-recurrent neural network (CNN-RNN)-based methods [21,22], and so on. Meanwhile, some research institutions and companies at home and abroad have started to invest a lot of human and material resources in road image acquisition and data annotation to better support the training and optimization of deep learning algorithms. A mask-region-based convolutional neural network (R-CNN) is used in crack detection, which can automatically detect and segment small cracks in asphalt pavements at the pixel level. Simulations were performed using Gprmax software (version V3.1.6) and combined with field inspection to determine the characteristics of cracks in ground-penetrating radar images of asphalt pavements and the relationship between vertical crack width and area in ground-penetrating radar images [23].

Zhang et al. [24] trained a deep convolutional neural network based on migration learning to classify pavement images and proposed a block-based threshold segmentation method to segment cracks at the pixel level. However, the method needed to be improved in terms of pixel-level recognition accuracy for cracks. Chen, J. [25] proposed a new neural network model based on the codec network and attention mechanisms to detect and evaluate road cracks, which could extract crack pixels more accurately and efficiently. However, the model had high requirements for equipment, which was not conducive to practical engineering applications, and the accuracy rate needed to be improved. Fan, L. [26] proposed a road image crack detection network, RAO-UNet. This network used a codec and an image frequency relationship based on a residual attention model to detect cracks. However, the method is time-consuming for detecting cracks. Moreover, the cracks identified by this method were prone to fracture Sun, X. [27] proposed an improved DeepLabv3+ segmentation network for crack recognition. The network had a multi-scale attention module that was able to assign different weights to different feature mappings. This method consumed more computational resources at runtime and had a larger weight file, and the accuracy rate needed to be improved. The deep residual convolutional neural network (ParallelResNet) proposed by Fan, Z. [28] was able to identify complex cracks. The network was tested only on still images and not on video streams and was more dependent on the accuracy of the dataset labeling. Yang, F. et al. [29] proposed a new crack identification network by combining feature pyramids and hierarchical boosting networks. Huyan, J. et al. [30] proposed a CrackU-net for pixel-level crack recognition networks. Experimental results show that the network outperforms traditional methods and full convolutional networks (FCN) for crack identification. However, the network relied on a large number of datasets and cannot be trained on smaller datasets, and the segmentation details were not well-handled. Ren, Y. [31] proposed a crack segmentation network called CrackSegNet. The network was more generalized than previous segmentation methods. However, the recognition accuracy of this network was not high, and the edge details were blurred. Ma, D. [32] proposed a crack recognition network based on a multi-feature layer convolutional neural network. Although this method improved the recognition accuracy, it did not make good use of the contextual features of the cracks. Liu, Z. [33] proposed a method to detect concrete cracks using the U-Net network. The network is capable of identifying crack locations from the input raw images under various conditions. It has high efficiency and robustness.

In order to solve the problem that the accuracy of pavement crack recognition is not high and the edge is blurred, this paper proposes an improved U-Net model to identify road pavement cracks. The model combines multi-scale feature prediction fusion and parallel attention mechanisms for segmentation of crack images. The experimental results prove the effectiveness and superiority of the method.

The rest of this paper is presented below. Section 2 describes the relevant work of this paper. Section 3 mainly shows the improvement method of this paper for the U-Net model. Section 4 is a summary of the model proposed in this paper. Section 5 presents the experimental results and analysis and Section 6 is the conclusion. Figure 1 shows the schematic diagram of the improvements made to the original U-Net model in this paper

2. Related Works

Original U-Net Model

The algorithm of the U-Net semantic segmentation model is shown in Figure 2. It is called the U-Net model because its structure is similar to the shape of a “U” [34].

The U-Net network consists of two parts: an encoder and a decoder. The encoder contains multiple convolutional and pooling layers for progressively reducing the spatial resolution of the input image and extracting features. The decoder, on the other hand, contains multiple up-sampling and convolutional layers for generating segmentation masks from the feature mapping extracted from the encoder. Since the image is operated by pooling layers during the down-sampling process, which causes the resolution of the image to be reduced, and thus much useful feature information of the input image to be lost, the decoder cannot solve this problem if it relies only on the up-sampling operation. Therefore, one of the key features of the U-Net network is the jump connection, which connects the feature mappings between the encoder and the decoder, allowing the decoder to utilize higher-resolution features for segmentation. This jump connection also helps to avoid information loss and distortion in the network.

With the fused jump connection of the encoder–decoder structure, the U-Net network shows great advantages in pixel-level segmentation tasks. The structure learns feature information at different levels through multi-level abstraction of the input image and combines high-level feature information with low-level information through jump connections to further improve the network’s ability to capture detailed information, thus achieving more accurate segmentation results. Moreover, the U-Net network makes up for the shortcomings of the full convolutional neural network. In terms of segmentation performance, the U-Net network has better results than the full convolutional neural network.

3. Methods

3.1. Multi-Scale Feature Prediction Fusion

In deep learning, multiscale networks refer to the use of different scales in the same neural network to process the input data. This approach improves the receptive field of the network to better process the objects in the input image, allowing the network to process objects of different sizes. The receptive field (RF) [35] is the area where the pixels in the input image have an impact on the output of the neural network. In a convolutional neural network, the receptive field of each neuron can be understood as the input area of that neuron. By understanding the receptive field size of a neuron, one can better understand how the neural network understands and represents the input image. The size of the receptive field has a significant impact on the performance of a neural network. Smaller receptive fields typically recognize smaller local features (e.g., edges, corner points, etc.), while larger receptive fields allow for the recognition of larger features (e.g., objects and scenes). In addition, a larger perceptual field can help the network to understand contextual information better, thus improving its performance. Usually, multiscale networks include two aspects: multiscale input and multiscale feature extraction. Therefore, by using multi-scale networks, more comprehensive information can be extracted, which includes both global overall information and local detailed information.

Multi-scale feature prediction fusion (MFPF) [36] is a commonly used multi-scale neural network structure that is widely used in computer vision fields such as image classification, target detection, semantic segmentation, etc. The core idea is to predict the output of the features in the model at different scales, and then fuse the predicted results.

The main purpose of this section is to combine the multi-scale feature prediction fusion idea with the ordinary U-Net network, improve the original U-Net network into a multi-scale feature fusion U-Net network, and apply it to the pavement crack segmentation. The improved U-Net network retains the original convolutional kernel pool part, with the difference that the intentional fusion prediction is a feature of the middle layer of the U-Net network. In addition, the output of the middle layer is restored to the size of the original target image by a bilinear interpolation operation before the feature fusion prediction, because the size of the feature map output at each scale is different after the target image is down-sampled by the encoder network. Finally, the feature maps of different scales are channel-stitched to obtain a tensor of a larger dimension. This tensor can then be processed using a convolution operation to obtain the final multi-scale feature prediction fusion output. This preserves the information at different scales and efficiently fuses it to improve the performance and generalization of the model. This approach can improve the perceptual field of the network, obtain contextual information at different scales, and enhance the network’s ability to understand and judge the target features. The output of this multi-scale feature prediction fusion can be mathematically expressed as:

Z (x) = C o n v ([Z_{0}, Z_{1}, Z_{2}, Z_{3}, Z_{4}])

(1)

where

Z (x)

denotes the output after feature prediction fusion, and

Z_{0}, Z_{1}, Z_{2}, Z_{3}, Z_{4}

denote the up-sampled output feature maps at different scales.

The structure of the U-Net network combined with multi-scale feature prediction fusion is shown in Figure 3, from which it can be seen that the pre-processed pavement crack images enter from the input of the network model, and through the multi-layer convolution operation and down-sampling process of the encoder network, the feature maps at different scales can be obtained, which have different resolutions and semantic information. Usually, the shallower feature maps contain more local information, while the deeper feature maps contain more global information. These feature maps are partly sent to higher levels and partly sent directly to the corresponding decoder network as input through a jump connection, and after the up-sampling operation, they are reduced to their original size and a prediction branch is output; finally, the feature images of the prediction branches at different scales are fused into the final output, so that the combination of global information at different scales is achieved.

3.2. Improving the U-NET Model Bottleneck Layer

The encoder part and the decoder part of the U-Net network are connected by a bottleneck layer. The purpose of adding a bottleneck layer is to extract deeper image features and make a smooth transition between the down-sampling part and the up-sampling part. In the original U-Net network, which is the traditional U-Net network, the bottleneck layer is composed of two convolutional layers. The bottleneck layer [37] contains the deep feature information obtained in the encoder, and since these deep feature data are highly representative of their input images, these deep feature data are transmitted to the decoder part through the bottleneck layer structure, which can interfere with the deep abstract feature data in the feature map. It can be seen that the bottleneck layer structure plays an important role in the segmentation effect on the final prediction of the image. The U-Net network bottleneck layer structure is shown in Figure 4.

This paper improved the bottleneck layer of the original U-Net network to further improve the segmentation effect of pavement crack images by improving the bottleneck layer. This paper proposed a normalized bottleneck layer design and used this bottleneck layer as a new bottleneck layer to replace the original bottleneck layer of the traditional U-Net network. The new bottleneck layer is a batch normalization (BN) layer [38] and a dropout layer [39], which were added after the original bottleneck layer, and the purpose of doing so was to perform batch normalization and regularization on the convolutional output of the original bottleneck layer. The structure diagram is shown in Figure 5.

The new bottleneck layer contains two convolutional layers, both of which have 1024 convolutional kernels, and these two convolutional layers are the ones that expand the dimensionality of the input image. As can be seen from the above figure, two batch normalization layers were added after each of the convolutional layers for the purpose of normalizing the activation values and preventing the network model from overfitting. The regularization layer was added after the second batch normalization layer to regularize the output image, which means that the activation value of a neuron will be stopped with a certain probability, P, during the training process to improve the generalization ability and stability of the network model.

3.3. New Parallel Attention Module

Although the U-Net network used in this paper is based on the encoder–decoder structure, which has great advantages in image semantic segmentation, the encoder–decoder structure will focus too much on the local information in the image when extracting features from the target image, and thus cannot take into account the global information of the image well, resulting in insufficient accuracy of the final segmentation results.

This paper proposed an attention mechanism module that is connected to the output of the multi-scale feature prediction fusion U-Net network, and the role of this module is to obtain richer global contextual information about the target image. The attention mechanism module proposed in this section was improved based on the parallel attention module in the dual-attention network (DANet) [40], and the structure of the parallel attention module in DANet is shown in Figure 6, from which it can be seen that this attention mechanism module is composed of a spatial attention module and a channel attention module by parallelism. The feature maps passing through this attention mechanism module will be processed by the spatial attention and channel attention mechanisms, separately, and the channel descriptions of the two sets of feature maps will be obtained after processing, and then these two sets of channel descriptions will be summed up.

The parallel attention module consists of a parallel channel attention module (CA) and a position attention module (PA), where the PA module emphasizes the position dependency between any two different positional features in the image and the CA module is used to focus on the channel dependency between different channels of the image. By summing the outputs of the two attention modules, similar features of subtle objects were selectively aggregated to highlight their feature representations, while reducing the influence of salient objects on image segmentation, and similar features at any scale were adaptively integrated from a global perspective to improve the segmentation accuracy of the images.

In order to enable the network model to associate more global features as well as to fully relate contextual relationships so that more global contextual relationships can be encoded as local features, this paper made modifications based on the parallel attention module by keeping the original SA module and replacing the gated attention mechanism module (GCT Block) with the CA module. A new parallel attention module was constructed for processing the output of multi-scale feature prediction fusion U-Net to further explore the global contextual information of images. The GCT module is also a channel attention module, which enables more efficient global context modeling of feature fusion maps and is more lightweight with fewer parameters than the common PA module. The structure of the proposed new parallel attention module is shown in Figure 7.

The CA module adjusts the channel features of the feature map by calculating the importance weights of each channel to improve the perception of key features. Specifically, each channel of the feature map is first downscaled by a global pooling layer, then two fully connected layers are passed to obtain the channel weight coefficients, and finally the weight coefficients are applied to the original feature map to obtain the channel attention-adjusted feature map. The GCT module is also a channel attention module. The structure diagram of the gated channel conversion attention module is shown in Figure 8.

Here,

X \in R^{C \times H \times W}

is the activation feature in the convolutional network, where

H

and

W

are the spatial height and width, and

C

is the number of channels.

\hat{X} = F (X | α, γ, β), α, γ, β \in R^{C}

(2)

The module introduces three weights,

α

,

γ

, and

β

. The weight

α

is responsible for the adaptive embedding output. The gating weight

γ

and the bias

β

are responsible for controlling the activation of the gate.

Here,

X = [x_{1}, x_{2}, \dots, x_{C}], x_{C} = [x_{C}^{i, j}] H \times W \in R^{H \times W}, c \in \{1, 2, \dots, C\}

, where

x_{c}

corresponds to each channel of

X

.

The channel attention module in the new parallel attention mechanism module was built based on the GCT module. Among the three main components of global context embedding, channel normalization, and gated adaptation, the GCT module uses a normalization method to establish a competitive or cooperative relationship between the channels. Among them, the normalization operation is parameter-free. To make the GCT module learn, a global context embedding operator was added to the module, which embeds the global context and controls the weights of each channel before normalization. A gated adaptive operator was also added, which adjusts the input features on the channel according to the normalized output. This global context embedding operator can encode the broader global contextual semantic information in the image as local features, thus improving feature representation.

The global context part of this module differs from the SE module [41] in that the gated channel transformed attention module does not use global pooling (GAP) because GAP can fail in some cases. For example, in some applications, instance normalization is used, which fixes the mean value of each channel, so that the resulting vector becomes constant. Therefore, this paper used the L2 norm for global context embedding (GCE). The module is defined as:

s_{c} = α {‖x_{c}‖}_{p} = α {\{[\sum_{i = 1}^{H} \sum_{i = 1}^{W} {(x_{c}^{i j})}^{p}] + ε\}}^{\frac{1}{p}}

(3)

where the parameter

α

is defined as

α = [α_{1} \dots α_{c}]

, the channel will not participate in channel normalization when

α

tends to zero, and

ε

is a very small constant that avoids the problem of deriving the derivative at the zero point.

The normalization method establishes a competitive relationship between channels so that the larger values of the channel response become relatively larger and suppress the other channels with smaller feedback.

l_{2} - n o r m

is used in this module to perform channel normalization.

{\hat{s}}_{c} = \frac{\sqrt{C} s_{c}}{{‖s‖}_{2}} = \frac{\sqrt{C} s_{c}}{{[(\sum_{i = 1}^{C} s_{c}^{2}) + ε]}^{\frac{1}{2}}}

(4)

The gated adaptive part was designed to control the activation of channel features by introducing a gating mechanism where the GCT module both competes and cooperates during the training process, where the weights

γ

and the bias

β

are designed to control whether the channel features are activated or not. When the feature weight

γ

of a channel is positively activated, the GCT module will facilitate the “competition” between the features of this channel and the features of other channels. When a channel’s feature

γ

is negatively activated, the GCT facilitates “cooperation” between the features of this channel and the features of other channels. Gated adaptive is defined as follows:

{\hat{x}}_{c} = x_{c} [1 + \tanh (γ_{c} {\hat{s}}_{c} + β_{c})]

(5)

where

γ = [γ_{1}, \dots, γ_{C}], β = [β_{1}, \dots, β_{C}]

.

In addition, when the gating weights and gating bias are 0, the original features are allowed to be passed to the next layer.

\hat{X} = F (X | α, 0, 0) = X

(6)

The PA module, on the other hand, focuses on the correlation between different positions in the feature map to better capture the contextual information of the target object. The position features of the feature map were adjusted by calculating the importance weights of each position location to enhance the target information in the feature map. Specifically, the feature map passed through two branches to calculate the horizontal and vertical position attention coefficients, and then the two coefficients were multiplied together to obtain the final position attention coefficients. Finally, the position attention coefficients were applied to the original feature map to obtain the spatial attention-adjusted feature map. The structure of the PA module is shown in Figure 9.

It can be seen from Figure 7 that the shape of the input feature map A is

C \times H \times W

. The input feature map passed through three convolution layers to obtain three new feature maps, which are represented as feature map B, feature map C, and feature map D. Then, a reconstruction operation (reshape) was performed on it, and after the reconstruction, its shape became

C \times H

, where

N = H \times W

, and then a transpose operation was performed on the feature map B to obtain

N \times C

. A relationship matrix

S

formed by only

N

points on

h

and

w

without considering the channel factor can be obtained, where the shape is

N \times N

. The relation matrix

S

was then converted to a probability distribution expression by a further normalization operation (SoftMax). Then, the relationship matrix Z was converted to a probability distribution expression by a normalization operation (SoftMax). The

S

matrix simply calculates the relationship between two points of a pixel in position space, and then multiplies it by the scale factor

α

, after which it is multiplied with the feature map D matrix to obtain

C \times N

. Next,

N

was expanded and reshaped to obtain

C \times H \times W

, i.e., it was restored to the original shape of the input feature map so that it could be summed with the input feature map to obtain the output. Here, the relationship matrix

S

is generated by the formula:

s_{j i} = \frac{\exp (B_{i} C_{j})}{\sum_{i = 1}^{N} \exp (B_{i} C_{j})}

(7)

E

is generated by the formula:

E_{j} = α \sum_{i = 1}^{N} (s_{j i} D_{i}) + A_{j}

(8)

3.4. Improved Loss Function

So that the improved model does not suffer from loss non-convergence in training, the loss function was improved in this section.

The cross-entropy loss function is often used as a loss function for segmentation networks, and it can be used in either binary classification tasks or multi-classification tasks. The output of this paper contained only two types of crack regions and other background regions, which belong to the binary classification task. The probability of predicting a pixel point as a crack is assumed to be

q

, and the probability of predicting it as a background is

1 - q

. At this time, the cross-entropy loss L is defined as shown in Equation (9):

L = \frac{1}{N} \sum_{i} L_{i} = \frac{1}{N} \sum_{i} - [z_{i} \cdot \log (q_{i}) + (1 - q_{i}) \cdot \log (1 - q_{i})]

(9)

In the formula, N represents the total number of samples, z represents the classification of sample i, and q refers to the probability that the sample is predicted to be cracked. The cross-entropy loss function gives the same weight to the task of whether the sample is cracked or background. However, usually the crack-occupied area in the pavement crack image is small, resulting in a serious imbalance in the number of positive and negative samples. For this problem, this section used the weighted loss function Focal Loss as the loss function of the multiscale U-Net network.

The weighted loss function is mainly improved on the basis of the cross-entropy loss function, and the improved loss function will improve the model’s focus on hard-to-classify samples and positive samples.

The weighted loss function adds a new factor

α

α \in [0, 1]

. By adjusting the value of the factor

α

to control the size of the weight of positive and negative samples for the overall loss, the smaller the value of

α

that can reduce the weight of negative samples, the more improved the cross-entropy loss

C E (q_{t})

, as shown in Equation (10):

C E (q_{t}) = - α_{t} \log (q_{t})

(10)

\frac{α}{1 - α} = \frac{n}{m}

(11)

where

m

and

n

represent the number of positive and negative samples. The value of

α

is determined by the ratio of the number of positive and negative samples.

The weighted loss function incorporates a moderation factor,

{(1 - q_{t})}^{γ}

, which serves to adjust the weights of the easy-to-classify samples and the hard-to-classify samples. When a sample is misclassified, it is considered a hard-to-classify sample, and the correct prediction probability

q_{t}

converges to 0. Then, the conditioning factor

{(1 - q_{t})}^{γ}

converges to 1. Conversely, when

q_{t}

converges to 1, the sample is considered as an easily classifiable sample and

{(1 - q_{t})}^{γ}

converges to 0.

F L (q_{t}) = - α_{t} {(1 - q_{t})}^{γ} \log (q_{t})

(12)

By doing so, the effect of the loss value of the easily classified samples on the overall loss value was attenuated.

4. A Crack Segmentation Model for Road Pavements Based on Multi-Scale Structure and Attention Mechanism U-Net

The proposed U-Net model for multi-scale feature prediction fusion was combined with the parallel attention module to construct a multi-scale attention mechanism U-Net pavement crack identification model, whose structure is shown in Figure 10. The improved structure of the bottleneck layer is shown in Figure 11

The pavement crack image was input from this network model. Firstly, it passed through the encoder part, where the features were further extracted at the bottleneck layer; then, the image passed through the decoder part, the up-sampled output was fused together by multi-scale feature prediction, and the aggregated output was finally processed by the new parallel attention module to further optimize the crack features. Finally, the output of the new parallel attention module was processed by the sigmoid layer to classify each pixel in the image as crack or background to achieve the recognition of pavement cracks. Table 1 shows the feature fusion process.

5. Experiments and Analysis of Results

5.1. Experimental Environment

The experiments were conducted using the following hardware and software environments: Tensorflow (version 2.12.0) and Keras (version 2.3.1) frameworks were used to build the model implementation, the operating system was Windows 10, the graphics card was Nvidia RTX3080TI, and the processor was Intel Core I712700h. The CRACK500 dataset and the CFD dataset were the datasets used for this experiment. The Crack500 dataset was expanded with data up to 3868 sheets in number. The dataset image size was unified to 256 × 256 and it was divided into training and test sets in the ratio of 7:3. The CFD dataset was expanded in number to 1260 sheets. The dataset image size was unified to 256 × 256. The Adam optimizer [42] with adaptive capability was used as the optimizer for this experimental model. The initial learning rate for training was 0.001, the decay value was 0.00001, the momentum coefficient was 0.9, the batch size was 8, and the number of epochs was 100. Table 2 shows the experimental environment configuration. Table 3 presents the data sets used for the experiments.

5.2. Evaluation Metrics

Since the task of this paper was mainly to binary classify each pixel point in the crack image, precision, recall, and F1-score were used as evaluation metrics. Meanwhile, this paper also involved a segmentation task and adopted mean intersection over union (mIOU), which is commonly used in segmentation tasks, as an evaluation index [43,44,45].

In the crack segmentation task, TP denotes the number of pixels in which the crack class was predicted as the crack class, FP is the number of pixels in which the background class was predicted as the crack class, and FN denotes the number of pixels in which the crack class was predicted as the background class. From these parameters, the precision rate, P, and the recall rate, R, can be calculated:

Precision = \frac{TP}{TP + FP}

(13)

Recall = \frac{TP}{TP + FN}

(14)

F 1 score = \frac{2 \times precision \times Recall}{precision + Recall}

(15)

The mean intersection over union is used in crack segmentation to calculate the ratio of the intersection and merge the predicted and labeled areas of each class, and finally the class average is taken. For the sake of presentation, assume that there are k + 1 classes of segmentation targets (k is 1 in the task of this paper), then,

p_{i i}

is the number of pixels belonging to class i predicted to be class i, while

p_{i j}

is the number of pixels belonging to class i predicted to be class j, and

p_{j i}

is the number of pixels belonging to class j predicted to be class i. The specific calculation formula is shown in Equation (16):

m I O U = \frac{1}{k} \sum_{i = 0}^{k} \frac{p_{i j}}{\sum_{j = 0}^{k} p_{i j} + \sum_{j = 0}^{k} p_{j i} - p_{i i}}

(16)

5.3. Improved Attention Module Performance Validation

The purpose of this section was to verify that the improved parallel attention module proposed in this paper was improved compared to the original parallel attention module. The improved parallel attention module and the original parallel attention module were separately connected to the improved U-Net model for the experiments. The two models are noted as the new model and the original model, respectively. The experimental results are shown in Figure 12 and Figure 13.

From Table 4, it can be seen that the improved parallel attention module was higher than the original parallel attention module in all four indicators. It can be concluded that the improved parallel attention module is more effective than the original parallel attention module.

5.4. Ablation Experiments

In this paper, ablation experiments were performed on the CRACK500 dataset to study the effect of multi-scale feature prediction fusion, an improved bottleneck layer, and an improved parallel attention module on the performance of pavement crack segmentation. In this paper, the improved U-Net model was trained by different combinations of multi-scale feature prediction fusion, the improved bottleneck layer and parallel attention module, and the remaining settings were consistent. The original U-Net model is denoted as RU-Net. The U-Net model with multi-scale feature prediction fusion is denoted as RNU-Net. The U-Net model with a modified parallel attention module is called RPU-Net. The U-Net model with improved bottleneck layers and multi-scale feature prediction fusion is recorded as RGU-Net. The U-Net model with multi-scale feature prediction fusion, an improved bottleneck layer, and an improved parallel attention mechanism module is recorded as RNGU-Net. The results of the ablation experiments are shown in Table 5.

It can be seen from the above table that RNU-Net improved in all four indicators compared to RU-Net. This indicates that the inclusion of multi-scale feature prediction fusion led to an enhancement in segmentation performance. Compared with RU-Net and RNU-Net, RGU-Net improved in four indicators. This proves that the improved bottleneck layer enhanced the segmentation effect. Compared with the other three models, RNGU-Net improved in four indicators. This indicates that the improvement scheme proposed in this paper helped to improve the segmentation results. It can also be seen from the above table that the enhancement of the improved parallel attention module proposed in this paper was greater than the multi-scale feature prediction fusion.

5.5. Comparison of the Proposed Model with Existing Models

Comparative experiments were conducted on the CRACK500 dataset to evaluate the superiority of the improved U-Net model for pavement crack segmentation. RAO-UNet [26], Crack-UNet [30], and DeepLabV3+ [27] were used as comparison methods in this paper. Precision, recall, F1-score, and mIOU were used as evaluation indicators to illustrate the advancement of the proposed method. The experimental loss functions of the model and the comparison models are shown in Figure 14.

In the loss function graph, the horizontal coordinate represents the number of iterations, and the vertical coordinate is the network loss value. Comparing the network loss rate of the comparison model, it can be seen that the network loss rate of the method in this paper was the lowest among these networks.

The experimental accuracies of the model in this paper and the comparison models are shown in Figure 15.

The accuracy of this experiment was recorded once for every two iterations: the horizontal coordinate represents the number of iterations, and the vertical coordinate represents the accuracy of each model. From the diagram, it can be seen that the accuracy of the model in this paper was the highest, reaching 96.01%, followed by the RAO-UNet model at 94.46%, the DeepLabV3+ model at 93.16%, and the Crack-UNet model, with an accuracy of 90.91%.

The results of different models in highway pavement crack segmentation are shown in Figure 16, Figure 17 and Figure 18.

The segmentation results of the above methods are summarized in Table 6.

From the above table, it can be seen that the method in this paper was significantly better than the other segmentation method models in the experiment. In terms of accuracy performance, the mean intersection over union of the model proposed in this paper reached 83.8%, and the accuracy rate reached 89.60%, which was 2.78% higher than the Crack-UNet model and 9.59% higher than the DeepLabV3+ model. The model proposed in this paper achieved 95.83% in the F1-score, and performed especially well in the Recall rate, which was better than the other methods. It was 12.87% higher than the Crack-UNet model, 4.29% higher than the DeepLabV3+ model, and 5.59% higher than the RAO-UNet model. In order to verify the performance of the network more intuitively, the segmentation results of these four methods are shown in Figure 19.

From the above image, it can be seen that among the segmentation effects of these methods, the Crack-UNet network had the worst segmentation effect, with a large area of missed detection, and the segmentation effect in the detail part was not very good, which is very different from the real label value. Taking the second original image as an example, the three comparison methods had different degrees of missed detection, among which the Crack-UNet network had the most missed detection, and the missed detection is marked with a red box in the figure. Although the DeepLabV3+ network and the RAO-UNet network were better than the Crack-UNet network, the processing effect in the detail part was still not satisfactory, and the connection was prone to missing segmentation. The segmentation effect based on the proposed method was better than the other three methods in terms of details and joints and was closest to the real label map.

In order to verify the generalization and superiority of this method, experiments were conducted on the CFD dataset again to verify the superiority of this method. The CFD dataset used in this experiment was not used for training, and it contains images taken under different conditions. The segmentation effect diagram is shown in Figure 20.

The experimental results are as follows:

It can be seen from Table 7 that the evaluation results of the mean intersection over union, accuracy, F1-score, and the recall rate of the proposed model were 82.07%, 93.29%, 89.64%, and 86.26%, respectively, which were better results than the other models. However, compared with the Crack500 dataset, the accuracy was much lower, and the mean intersection over union and F1-score were reduced by 1.73% and 2.97%, respectively. The segmentation performance of the four models was lower than that of the Crack500 dataset. The Crack-UNet could generally divide the outline of the cracks, but the performance in the detailed part of the cracks was very poor, resulting in much loss of the detailed part. The DeepLabV3+ model was better than the Crack-UNet model in the detail part, but it was still not satisfactory, much different from the real label image, and in the segmentation, there was a mis-segmentation situation, with the background showing as a crack for segmentation. The RAO-UNet model had a fracture when splitting the cracks, and the cracks could not be completely split out. The model proposed in this paper was closer to the labeled image in terms of the segmentation effect, and the overall effect was better than the other three models, while the other three methods did not perform as well as they should on the CFD dataset. It was also shown that the proposed model in this paper can be widely adaptable in different situations. Therefore, we can say that the method used in this paper is more accurate and robust than the other three methods and has wider adaptability to situations that did not arise during the training and validation process.

6. Conclusions

This paper proposed an improved U-Net segmentation model for highway pavement crack identification. The multi-scale feature prediction fusion and the improved parallel attention module were applied to the original U-Net network, and the bottleneck layer of the original U-Net network was improved to achieve higher accuracy and better segmentation results. Publicly available datasets (Crack500 and CFD) were used to validate the methodology. When detecting road pavement cracks, the improved U-Net improved in all four metrics compared to the original U-Net (when both models were trained with 100 epochs). Experimentally comparing the method of this paper with other competing methods proposed in the literature, the maximum scores using the Crack500 dataset were: precision (89.60%), recall (95.83%), mIOU (83.80%), and F1-score (92.61%). High values in precision (93.29%), mIOU (82.07%), recall (86.26%), and F1-score (89.64%) were also found using the CFD dataset. The results show that this method performed the best among the four methods. Thus, the improved network showed some advantages in identifying cracks in highway pavements and is an ideal tool for practical engineering.

Author Contributions

Methodology, Q.W., Z.S. and H.C.; writing—original draft, Q.W. and Z.S.; writing—review and editing, Q.W., H.C., Y.L. and L.Z.; funding acquisition, Q.W. All authors have read and agreed to the published version of the manuscript.

Funding

Qinge Wu received funding for this work through the Key Science and Technology Program of Henan Province (number 222102210084) and the Key Science and Technology Project of Henan Province University (number 23A413007).

Data Availability Statement

The datasets used in the experiment are publicly available online at: https://aistudio.baidu.com/aistudio/datasetdetail/74646, accessed on 17 March 2023, and at: https://github.com/cuilimeng/CrackForest-dataset, accessed on 20 March 2023.

Conflicts of Interest

The authors declare no conflict of interest.

References

Hou, Y.; Liu, S.; Cao, D.; Peng, B.; Liu, Z.; Sun, W.; Chen, N. A Deep Learning Method for Pavement Crack Identification Based on Limited Field Images. IEEE Trans. Intell. Transp. Syst. 2022, 23, 22156–22165. [Google Scholar] [CrossRef]
Parrany, A.M.; Mirzaei, M. A new image processing strategy for surface crack identification in building structures under non-uniform illumination. IET Image Process. 2021, 16, 407–415. [Google Scholar] [CrossRef]
Wang, D.; Liu, Z.; Gu, X.; Wu, W.; Chen, Y.; Wang, L. Automatic detection of pothole distress in asphalt pavement using improved convolutional neural networks. Remote Sens. 2022, 14, 3892. [Google Scholar] [CrossRef]
Xiao, Y.; Li, J. Crack detection algorithm based on the fusion of percolation theory and adaptive canny operator. In Proceedings of the 2018 37th Chinese Control Conference (CCC), Wuhan, China, 25–27 July 2018; pp. 4295–4299. [Google Scholar]
Zakeri, H.; Nejad, F.M.; Fahimifar, A. Image Based Techniques for Crack Detection, Classification and Quantification in Asphalt Pavement: A Review. Arch. Comput. Methods Eng. 2016, 24, 935–977. [Google Scholar] [CrossRef]
Arya, D.; Ghosh, S.K.; Toshniwal, D. Automatic Recognition of Road Cracks Using Sobel Components in Digital Images. In Proceedings of the Sixth International Conference of Transportation Research Group of India: CTRG 2021, Singapore, 14–17 December 2021; Volume 1, pp. 139–149. [Google Scholar]
Peng, C.; Yang, M.; Zheng, Q.; Zhang, J.; Wang, D.; Yan, R.; Wang, J.; Li, B. A triple-thresholds pavement crack detection method leveraging random structured forest. Constr. Build. Mater. 2020, 263, 120080. [Google Scholar] [CrossRef]
Kheradmandi, N.; Mehranfar, V. A critical review and comparative study on image segmentation-based techniques for pavement crack detection. Constr. Build. Mater. 2022, 321, 126162. [Google Scholar] [CrossRef]
Nnolim, U.A. Automated crack segmentation via saturation channel thresholding, area classification and fusion of modified level set segmentation with Canny edge detection. Heliyon 2020, 6, e05748. [Google Scholar] [CrossRef]
Kumar, R.; Singh, S.K. Crack detection near the ends of a beam using wavelet transform and high resolution beam deflection measurement. Eur. J. Mech. -A/Solids 2021, 88, 104259. [Google Scholar] [CrossRef]
Lakshmi, K. Detection and quantification of damage in bridges using a hybrid algorithm with spatial filters under environmental and operational variability. Structures 2021, 32, 617–631. [Google Scholar] [CrossRef]
Akbari, J.; Ahmadifarid, M.; Amiri, A.K. Multiple Crack Detection using Wavelet Transforms and Energy Signal Techniques. Frat. Ed Integrita Strutt. 2020, 14, 269–280. [Google Scholar] [CrossRef]
Su, C.; Wang, W. Concrete Cracks Detection Using Convolutional NeuralNetwork Based on Transfer Learning. Math. Probl. Eng. 2020, 2020, 1–10. [Google Scholar] [CrossRef]
Yang, Q.; Ji, X. Automatic Pixel-Level Crack Detection for Civil Infrastructure Using Unet++ and Deep Transfer Learning. IEEE Sens. J. 2021, 21, 19165–19175. [Google Scholar] [CrossRef]
Nie, M.; Wang, K. Pavement distress detection based on transfer learning. In Proceedings of the 2018 5th International conference on systems and informatics (ICSAI), Nanjing, China, 10–12 November 2018; pp. 435–439. [Google Scholar]
Motwani, A.; Shukla, P.K.; Pawar, M. Novel Machine Learning Model with Wrapper-Based Dimensionality Reduction for Pre-dicting Chronic Kidney Disease Risk. Soft Comput. Signal Process. 2020, 1, 29. [Google Scholar]
Shukla, P.K.; Shukla, P.K.; Bhatele, M.; Chaturvedi, A.K.; Sharma, P.; Rizvi, M.A.; Pathak, Y. A novel machine learning model to predict the staying time of international migrants. Int. J. Artif. Intell. Tools 2021, 30, 2150002. [Google Scholar] [CrossRef]
Santosh, K.C.; Pradeep, N.; Goel, V.; Ranjan, R.; Pandey, E.; Shukla, P.K.; Nuagah, S.J. Machine Learning Techniques for Human Age and Gender Identification Based on Teeth X-Ray Images. J. Healthc. Eng. 2022, 2022, 1–14. [Google Scholar] [CrossRef] [PubMed]
Tian, L.; Wang, Z.; Liu, W.; Cheng, Y.; Alsaadi, F.E.; Liu, X. A New GAN-Based Approach to Data Augmentation and Image Segmentation for Crack Detection in Thermal Imaging Tests. Cogn. Comput. 2021, 13, 1263–1273. [Google Scholar] [CrossRef]
Zhang, K.; Zhang, Y.; Cheng, H.D. CrackGAN: Pavement crack detection using partially accurate ground truths based on gen-erative adversarial learning. IEEE Trans. Intell. Transp. Syst. 2020, 22, 1306–1319. [Google Scholar] [CrossRef]
Ahmed, T.U.; Hossain, M.S.; Alam, M.J.; Andersson, K. An integrated CNN-RNN framework to assess road crack. In Proceedings of the 2019 22nd International Conference on Computer and Information Technology (ICCIT), Dhaka, Bangladesh, 18–20 December 2019; pp. 1–6. [Google Scholar]
Jiang, H.; Hu, Q.; Zhi, Z.; Gao, J.; Gao, Z.; Wang, R.; He, S.; Li, H. Convolution neural network model with improved pooling strategy and feature selection for weld defect recognition. Weld. World 2020, 65, 731–744. [Google Scholar] [CrossRef]
Liu, Z.; Yeoh, J.K.; Gu, X.; Dong, Q.; Chen, Y.; Wu, W.; Wang, L.; Wang, D. Automatic pixel-level detection of vertical cracks in asphalt pavement based on GPR investigation and improved mask R-CNN. Autom. Constr. 2023, 146, 104689. [Google Scholar] [CrossRef]
Zhang, K.; Cheng, H.D.; Zhang, B. Unified Approach to Pavement Crack and Sealed Crack Detection Using Preclassification Based on Transfer Learning. J. Comput. Civ. Eng. 2018, 32, 04018001. [Google Scholar] [CrossRef]
Chen, J.; He, Y. A novel U-shaped encoder–decoder network with attention mechanism for detection and evaluation of road cracks at pixel level. Comput. Civ. Infrastruct. Eng. 2022, 37, 1721–1736. [Google Scholar] [CrossRef]
Fan, L.; Zhao, H.; Li, Y.; Li, S.; Zhou, R.; Chu, W. RAO-UNet: A residual attention and octave UNet for road crack detection via balance loss. IET Intell. Transp. Syst. 2021, 16, 332–343. [Google Scholar] [CrossRef]
Sun, X.; Xie, Y.; Jiang, L.; Cao, Y.; Liu, B. DMA-Net: DeepLab with Multi-Scale Attention for Pavement Crack Segmentation. IEEE Trans. Intell. Transp. Syst. 2022, 23, 18392–18403. [Google Scholar] [CrossRef]
Fan, Z.; Lin, H.; Li, C.; Su, J.; Bruno, S.; Loprencipe, G. Use of Parallel ResNet for High-Performance Pavement Crack Detection and Measurement. Sustainability 2022, 14, 1825. [Google Scholar] [CrossRef]
Yang, F.; Zhang, L.; Yu, S.; Prokhorov, D.; Mei, X.; Ling, H. Feature Pyramid and Hierarchical Boosting Network for Pavement Crack Detection. IEEE Trans. Intell. Transp. Syst. 2019, 21, 1525–1535. [Google Scholar] [CrossRef] [Green Version]
Huyan, J.; Li, W.; Tighe, S.; Xu, Z.; Zhai, J. CrackU-net: A novel deep convolutional neural network for pixelwise pavement crack detection. Struct. Control. Health Monit. 2020, 27, e2551. [Google Scholar] [CrossRef]
Ren, Y.; Huang, J.; Hong, Z.; Lu, W.; Yin, J.; Zou, L.; Shen, X. Image-based concrete crack detection in tunnels using deep fully convolutional networks. Constr. Build. Mater. 2020, 234, 117367. [Google Scholar] [CrossRef]
Ma, D.; Fang, H.; Wang, N.; Xue, B.; Dong, J.; Wang, F. A real-time crack detection algorithm for pavement based on CNN with multiple feature layers. Road Mater. Pavement Des. 2021, 23, 2115–2131. [Google Scholar] [CrossRef]
Liu, Z.; Cao, Y.; Wang, Y.; Wang, W. Computer vision-based concrete crack detection using U-net fully convolutional networks. Autom. Constr. 2019, 104, 129–139. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Zhang, Y.; Li, W.; Zhang, L.; Ning, X.; Sun, L.; Lu, Y. AGCNN: Adaptive Gabor Convolutional Neural Networks with Receptive Fields for Vein Biometric Recognition. Concurr. Comput. Pract. Exp. 2020, 34, e5697. [Google Scholar] [CrossRef]
Chen, Y.; Ma, T.; Yang, X.; Wang, J.; Song, B.; Zeng, X. MUFFIN: Multi-scale feature fusion for drug–drug interaction prediction. Bioinformatics 2021, 37, 2651–2658. [Google Scholar] [CrossRef]
Olimov, B.; Sanjar, K.; Din, S.; Ahmad, A.; Paul, A.; Kim, J. FU-Net: Fast biomedical image segmentation model based on bottleneck convolution layers. Multimedia Syst. 2021, 27, 637–650. [Google Scholar] [CrossRef]
Benz, P.; Zhang, C.; Karjauv, A.; Kweon, I.S. Revisiting batch normalization for improving corruption robustness. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, Waikoloa, HI, USA, 3–8 January 2021; pp. 494–503. [Google Scholar]
Fan, X.; Zhang, S.; Tanwisuth, K.; Qian, X.; Zhou, M. Contextual dropout: An efficient sample-dependent dropout module. arXiv 2021, arXiv:2103.04181. [Google Scholar]
Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15-20 June 2019; pp. 3146–3154. [Google Scholar]
Chao, X.; Hu, X.; Feng, J.; Zhang, Z.; Wang, M.; He, D. Construction of Apple Leaf Diseases Identification Networks Based on Xception Fused by SE Module. Appl. Sci. 2021, 11, 4614. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Chen, T.; Cai, Z.; Zhao, X.; Chen, C.; Liang, X.; Zou, T.; Wang, P. Pavement crack detection and recognition using the architecture of segNet. J. Ind. Inf. Integr. 2020, 18, 100144. [Google Scholar] [CrossRef]
Li, D.; Duan, Z.; Hu, X.; Zhang, D. Pixel-Level Recognition of Pavement Distresses Based on U-Net. Adv. Mater. Sci. Eng. 2021, 2021, 1–11. [Google Scholar] [CrossRef]
Wen, Z.; Wang, H.; Yuan, H.; Liu, M.; Guo, X. A method of pulmonary embolism segmentation from CTPA images based on U-net. In Proceedings of the 2019 IEEE 2nd International Conference on Computer and Communication Engineering Technology (CCET), Beijing, China, 16–18 August 2019; pp. 31–35. [Google Scholar]

Figure 1. The improvements made in this paper.

Figure 2. Principle of the U-Net model.

Figure 3. Multi-scale U-Net network structure diagram.

Figure 4. U-Net network bottleneck layer structure.

Figure 5. Normalized U-Net network bottleneck layer design.

Figure 6. Parallel attention module structure.

Figure 7. New parallel attention module.

Figure 8. Gated channel conversion attention module.

Figure 9. Position attention structure diagram.

Figure 10. Structure of the U-Net model of the multi-scale attention mechanism.

Figure 11. New bottleneck layer network structure diagram.

Figure 12. The new model loss curve graph.

Figure 13. The original model loss curve graph.

Figure 14. Loss curves for the four models.

Figure 15. Accuracy comparison of the four models.

Figure 16. mIOU index diagram of the four models.

Figure 17. Precision indicator diagrams for the four models.

Figure 18. Recall index diagram of the four models.

Figure 19. The segmentation effect diagram of the proposed model and the comparison models.

Figure 20. CFD dataset segmentation effect diagram.

Table 1. Feature fusion process.

Down-Sampling Section	Up-Sampling Section	Feature Fusion
256 × 256 × 64	256 × 256 × 64	256 × 256 × 128
128 × 128 × 128	128 × 128 × 128	128 × 128 × 256
64 × 64 × 256	64 × 64 × 256	64 × 64 × 512
32 × 32 × 512	32 × 32 × 512	32 × 32 × 1024

Table 2. Experimental environment.

Hardware environment	Nvidia RTX3080TI, Intel Core I712700h
Software environment	Windows 10, Tensorflow 2.12.0

Table 3. Dataset introduction.

Dataset	Number of Categories	Number of Images	Number of Labels
Crack500	3	3868	3868
CFD	3	1260	1260

Table 4. Experimental results of the comparison of the two models.

Method	mIOU/%	Precision/%	F1-Score/%	Recall/%
The new model	81.62	86.28	89.30	92.54
The original model	79.33	82.93	85.84	88.96

Table 5. Results of ablation experiments.

Method	mIOU/%	Precision/%	F1-Score/%	Recall/%
RU-Net	75.88	80.39	83.90	86.73
RNU-Net	76.56	82.21	84.28	88.38
RPU-Net	79.89	84.01	86.73	89.63
RGU-Net	80.13	85.33	88.76	90.11
RNGU-Net	82.90	88.10	90.25	94.61

Table 6. Comparison of pavement crack segmentation results.

Method	mIOU/%	Precision/%	F1-Score/%	Recall/%
Crack-UNet	76.04	86.82	84.85	82.96
DeepLabV3+	77.34	80.01	85.39	91.54
RAO-UNet	82.56	89.15	89.69	90.24
Ours	83.80	89.60	92.61	95.83

Table 7. CFD dataset experimental results.

Method	mIOU/%	Precision/%	F1-Score/%	Recall/%
Crack-UNet	72.51	89.7	83.13	77.45
DeepLabV3+	76.83	91.94	86.29	81.3
RAO-UNet	80.23	90.89	88.34	85.92
Ours	82.07	93.29	89.64	86.26

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, Q.; Song, Z.; Chen, H.; Lu, Y.; Zhou, L. A Highway Pavement Crack Identification Method Based on an Improved U-Net Model. Appl. Sci. 2023, 13, 7227. https://doi.org/10.3390/app13127227

AMA Style

Wu Q, Song Z, Chen H, Lu Y, Zhou L. A Highway Pavement Crack Identification Method Based on an Improved U-Net Model. Applied Sciences. 2023; 13(12):7227. https://doi.org/10.3390/app13127227

Chicago/Turabian Style

Wu, Qinge, Zhichao Song, Hu Chen, Yingbo Lu, and Lintao Zhou. 2023. "A Highway Pavement Crack Identification Method Based on an Improved U-Net Model" Applied Sciences 13, no. 12: 7227. https://doi.org/10.3390/app13127227

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Highway Pavement Crack Identification Method Based on an Improved U-Net Model

Abstract

1. Introduction

2. Related Works

Original U-Net Model

3. Methods

3.1. Multi-Scale Feature Prediction Fusion

3.2. Improving the U-NET Model Bottleneck Layer

3.3. New Parallel Attention Module

3.4. Improved Loss Function

4. A Crack Segmentation Model for Road Pavements Based on Multi-Scale Structure and Attention Mechanism U-Net

5. Experiments and Analysis of Results

5.1. Experimental Environment

5.2. Evaluation Metrics

5.3. Improved Attention Module Performance Validation

5.4. Ablation Experiments

5.5. Comparison of the Proposed Model with Existing Models

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI