Next Article in Journal
A Multiscale Multi-Feature Deep Learning Model for Airborne Point-Cloud Semantic Segmentation
Next Article in Special Issue
An Established Theory of Digital Twin Model for Tunnel Construction Safety Assessment
Previous Article in Journal
Emotion Detection Using Facial Expression Involving Occlusions and Tilt
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hybrid-Supervised-Learning-Based Automatic Image Segmentation for Water Leakage in Subway Tunnels

1
School of Geomatics and Urban Spatial Informatics, Beijing University of Civil Engineering and Architecture, Beijing 100044, China
2
School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(22), 11799; https://doi.org/10.3390/app122211799
Submission received: 3 November 2022 / Revised: 15 November 2022 / Accepted: 18 November 2022 / Published: 20 November 2022
(This article belongs to the Special Issue Urban Underground Engineering: Excavation, Monitoring, and Control)

Abstract

:
Quickly and accurately identifying water leakage is one of the important components of the health monitoring of subway tunnels. A mobile vision measurement system consisting of several high-resolution, industrial, charge-coupled device (CCD) cameras is placed on trains to implement structural health monitoring in tunnels. Through the image processing technology proposed in this paper, water leakage areas in subway tunnels can be found and repaired in real time. A lightweight automatic segmentation approach to water leakage using hybrid-supervised-deep-learning technology is proposed. This approach consists of the weakly supervised learning Water Leakage-CAM and fully supervised learning WRDeepLabV3+. The Water Leakage-CAM is used for the automatic labeling of data. The WRDeepLabV3+ is used for the accurate identification of water leakage areas in subway tunnels. Compared with other end-to-end semantic segmentation networks, the hybrid-supervised learning approach can more completely segment the water leakage region when dealing with water leakage in complex environments. The hybrid-supervised-deep-learning approach proposed in this paper achieves the highest MIoU of 82.8% on the experimental dataset, which is 6.4% higher than the second. The efficiency is also 25% higher than the second and significantly outperforms other end-to-end deep learning approaches.

1. Introduction

With the construction and operation of urban subways, various problems related to subway tunnels continue to emerge [1,2]. These have created serious social and economic problems. The main problems associated with subway tunnel diseases are shown in Table 1, in which water leakage is shown to be one of the main problems in subway tunnel diseases. The New York subway tunnels in the United States caused train delays due to water leakage caused by signal failure [3]. Among the 12 subway lines of Beijing in China, there are seven common subway tunnel diseases, including segment water leakage, cracks, misaligned lining, empty track bed, concrete deterioration and section ovalization. Water leakage is one of the main diseases in Beijing subway tunnels [4].
In order to ensure the safe operation of subways, the mobile vision-measurement system is often placed on trains for mobile vision measurement [5]. As shown in Figure 1, 10 high-resolution industrial charge-coupled device (CCD) cameras are placed on the top, left, and right sides and bottom of the train. With 10 LED lighting apparatuses, the subway tunnel was photographed in a ring, and the image data of the entire tunnel was obtained [6,7].
The real-time mileage information of the train is obtained through the odometer, inertial navigation system, communication system, etc., and then the image with position information is obtained [8,9]. Additionally, real-time detection is carried out through a series of image-processing techniques [10,11]. When the water leakage is found, the relevant staff can repair it in time through the location information of the image.
In recent years, the development of deep learning has revolutionized visual measurement. For water leakage detection, semantic segmentation is a better choice. Object detection is divided into non-automatic detection and automatic detection. In non-automatic detection, the approach of fully supervised learning can more accurately segment the water leakage area [12,13]. Xue and Li [14] adopted full convolution (FCN) and Dai et al. [15] local-based FCN (R-FCN) to label, classify, and detect cracks and seepage. Huang et al. [16] adopted FCN Semantic segmentation of leaking water. However, due to its spatial invariance characteristics, small water leakage areas may be ignored [17]. Han et al. [18] proposed a multi-spectral water-leakage detection method that integrates visual optics and thermal infrared and detects the water leakage through single-mode feature extraction and multi-mode feature fusion of FPN, but this method is only performed in laboratory simulations and is not suitable for complex environments. Xiong et al. [19] designed a deep learning system based on an image recognition model to detect water leakage. However, using a VIS color camera is greatly affected by the lighting conditions in the tunnel, and this reduces the detection accuracy. The fully supervised learning approach requires a large number of high-precision dataset labels, so the pre-label production time is long and the efficiency is low. The approach of weakly supervised learning is able to achieve semantic segmentation of objects with few data labels [20]. Zhu et al. [21] used weakly supervised networks to detect and segment different types of cracks and compared the segmentation effects of different networks but did not evaluate the accuracy, precision, and other indicators. Chang et al. [22] proposed a defect segmentation system based on weakly supervised learning, which improves the accuracy of classification and segmentation, but the efficiency is not high, and the problem of time-consuming data labeling has not been solved. Zhao et al. [23] used Mask R-CNN for instance segmentation of water leakage in tunnel linings. The time-consuming problem was solved without overfitting, but there was a problem of low segmentation accuracy. Wang et al. [24] proposed a new network framework as the backbone of the weakly supervised network and added the K-means clustering algorithm to improve the segmentation accuracy. However, due to the limitations of K-means, the image segmentation effect is not ideal in complex environments.
Automatic detection has become the mainstream trend of current research, which can ensure detection accuracy while ensuring efficiency. Zhang et al. [25] proposed a new method for automated pixel-level crack detection on 3D asphalt pavement and called it CrackNet-R. This method can achieve high precision in crack detection. However, it can easily create technical isolation for users in practical applications. Li et al. [26], Dung et al. [27], and Yang et al. [28] applied a fully convolutional network (FCN) to perform automatic pixel-level crack detection. This method has a good effect on large-pixel crack detection but has certain limitations for small cracks, and the effect is not ideal. Bang et al. [29] proposed an encoder–decoder network-based method for the automatic detection of road cracks at the pixel level. Transfer learning is performed using ResNet152, which achieves the best performance among several convolutional neural networks, but the experimental results do not achieve the expected accuracy. Liu et al. [30] propose the use of U-Net for automatic crack detection. U-Net has better robustness than Cha’s CNN method and is more suitable for crack detection. However, due to the complex environment and interference, it will generate redundant identification. Song et al. [13] proposed an improved automatic crack detection framework based on DeepLabV3+ for detection and obtained a speed of 23 frames per second, but the robustness is not strong, only for small data samples. Therefore, automatic object detection is still a challenge for pixel segmentation.
At present, there are still problems such as complex algorithms, long time consumption, and low detection accuracy in automatic detection. This paper proposes a lightweight hybrid supervised learning (HSL) approach that combines efficient weakly supervised learning with high-precision, fully supervised learning. Among them, weakly supervised learning is used for the automatic generation of water leakage pixel-level labels, while fully supervised learning is used for water leakage semantic segmentation. In weakly supervised learning, the Water Leakage-CAM (WL-CAM) method framework is proposed. Through a multi-level fusion strategy, high-precision pixel-level labels are obtained. First, an adaptive pixel segmentation clustering (APS) algorithm is used to generate image-level labels with less background information. Second, the WRes2Net deep learning network is used for training, and the class activation map (CAM) generated by training is used to generate label files and image-level labels for pixel value overlap and fusion, thereby reducing data labeling time and labor costs. In fully supervised learning, this paper improves the DeepLabV3+ network, adding multiple learning rate decay methods, optimizers, and attention mechanisms. It can accurately segment water leakage in various complex environments, and the segmentation accuracy is greatly improved.

2. Approach

The HSL combines the advantages of weakly supervised learning and fully supervised learning. The accuracy of data labels is guaranteed without manual labeling, while the accuracy of fully supervised learning semantic segmentation is guaranteed.

2.1. Basic Idea

Figure 2 shows the basic idea of HSL. As a new approach for identifying water leakage in subway tunnels, it can automatically identify water leakage areas without labels.
We take an unlabeled original dataset as an example to illustrate the implementation process of the HSL approach. The original dataset includes RGB images ( I R G B ) and grayscale images ( I G r a y ). APS generates image-level labels ( I A P S ) in two steps. Firstly, the number K of cluster types is calculated through the gray gradient distribution and gray peak value of I G r a y . Secondly, batch clustering is performed using the I R G B dataset to obtain I A P S . The WRes2Net deep learning network is used to extract the features of the I A P S , generate a CAM feature map, and refine the CAM feature map to generate a CAM label ( I S M D ). The I A P S and I S M D are fused and superimposed. Based on the I A P S category, the region with the largest category weight is reserved, and the pixel-level label I is obtained. Through WRDeepLabV3+, the feature of water leakage is extracted for I , and the water leakage area is segmented.

2.2. Framework

As shown in Figure 3, the HSL approach adopts a two-stage structure, which is divided into a weakly supervised learning WL-CAM and a fully supervised learning WRDeepLabV3+. There are two steps in WL-CAM. Step 1, according to the characteristics of water leakage, is to propose the APS algorithm and the segment to obtain image-level labels. Step2 is to train WRes2Net semantic segmentation network to obtain a CAM [31]. The generated CAM will obtain the label file (Mask Data) through the random-walk algorithm (RW). Image-level tags are combined with it. To further improve the completeness and accuracy of dataset labels, pixel-level labels are obtained. Feature extraction and semantic segmentation of water leakage are carried out on pixel-level labels using WRDeepLabV3+. In WRDeepLabV3+, the main network is WRes2Net, ASSP is a spatial pyramid pooling module with atrous convolution, and transfer learning is performed through pixel-level labels and the original dataset. We use the channel attention mechanism (CA) to extract the high-level semantic information of water leakage, perform average pooling on the high-level semantic information to obtain a feature vector in the direction of the channel. Then use two nonlinear fully connected layers (FC) to obtain the correlation relationship between channels to limit the complexity of the model. Finally Sigmoid is used for channel feature vector normalization. The spatial attention mechanism (SA) is used to focus the water leakage target area to extract the low-level semantic information of the water leakage, the semantic segmentation accuracy is improved through the global convolutional network, and finally, the channel feature vector normalization processing is performed using Sigmoid. Finally, through the combination of high-level semantic information and low-level semantic information, the water leakage area is generated.

2.3. Automatic Generation of Water Leakage Labels for Subway Tunnels

The automatic generation of water leakage labels for subway tunnels adopts the WL-CAM based on weakly supervised learning.
Image-level labels are generated through ASP, which is divided into the following steps:
  • Calculate the gray value of the image to find K. Suppose an image I g r a y consists of N water basin characteristic areas, I g r a y = { γ n | γ n R d , n = 1 , 2 , 3 , , N } , where γ n consists of M data that provide its characteristics. γ n = { η m | η m γ n , m = 1 , 2 , 3 , , M } , and η m is the gray value of M pixels in the feature area. Calculate the grayscale histogram. In the grayscale histogram, U ν m a x = { U ν m a x I g r a y , ν = 1 , 2 , 3 , , V } , U ν m a x is the peak value of the gray value, and ν is the peak value number.
  • To further determine the number of clusters K, find the γ n mean γ n ¯ , and the number of types of K is:
    K = { n     γ n ¯ < a             v   a <   γ n ¯ < b   a n d   a < U ν m a x < b
    K = n + v  
    where a and b are grayscale values.
  • To process the RGB of the image, let I r g b consist of N data, I r g b = { S n | S n R d , n = 1 , 2 , 3 , , N } . A binary variable P n k { 0 , 1 } is introduced to indicate which k cluster S k (k = 1, 2, …, K) any point S n belongs to. When P n k = 1, it means that the data belong to class k; otherwise it is 0. The loss function is thus defined as:
    Ψ = n = 1 N k = 1 K P n k S n S k 2
    S n S k = D 1 2 + D 2 2
    D 1 = ( R n R k ) 2 + ( G n G k ) 2 + ( B n B k ) 2
    D 2 = ( X n X k ) 2 + ( Y n Y k ) 2
    D 1 is the color distance, D 2 is the Euclidean space distance, S n S k is the absolute value of the clustering distance. The APS converges the cluster centers through continuous iteration, and S n and S k can be optimized alternately during the iterative process. When K = 2, calculate the pixel area and keep the largest pixel area.
  • During clustering, the calculation of spatial distance will affect the classification of pixel points, resulting in the existence of independent pixels within the cluster. The Canny edge extraction algorithm obtains the closed cluster edge to determine the closed cluster. Calculate the color distance between the independent pixels inside the closed cluster and the K cluster centers, and assign it to the cluster area with the smallest color distance, as shown in Figure 4.
After the image-level labels are obtained, deep learning training is performed on the dataset to generate Mask Data and the segmentation performance is mainly affected by the classifier. Compared with Res2Net (as shown in Figure 5), WRes2Net adds atrous convolution, which can obtain more characteristic information about water leakage without changing the scale and can connect more residual blocks to make water leakage. Feature information is more abundant. After each convolution, the feature maps go through activation layers and normalization [32,33]. To make the network have better accuracy and stability, the activation function ReLU is replaced by the Mish function [34]. The non-negative effect of the activation function can make the weight layer update less ideal, placing the activation layer before the normalization layer [33].
In CAM, the higher the eigenvalue of the region concerned by the model, the darker the color, and the higher the weight of the representation set:
I S M D = C a S C F C
where I S M D is the Mask Data image, S is the label type, C is the number of channels, a S C is the weight of the C-th channel type t, and F C is the feature map of the C-th channel. Expand the weight a S C to obtain:
a S C = 1 L i j ζ S F C i , j
where L is the width and height of the feature map, ζ S is the probability of outputting the target type, and F C i , j represents the pixel value at ( i , j ) in the C-th feature map.
Since the weakly supervised semantic segmentation network only detects the main features of different categories in the image, it is difficult to obtain a complete object response map. Combining Mask Data with image-level labels:
I = α I A P S + β I S M D + ο
I is the pixel-level label, I A P S is the image-level label, α and β are the weights, and ο is the residual.
Based on I A P S , I A P S contains n subclass labels, which overlap with I S M D , and the subclass label with the highest overlap weight value is the feature target. The images are corrected and supplemented by the residual network in the training network.

2.4. Semantic Segmentation of Water Leakage in Subway Tunnels

As can be seen from Figure 6, WRDeepLabV3+ consists of two modules. The encoding area (Encoder) is used for the extraction of high-level semantic information. The decoding area (Decoder) is used for the extraction of low-level semantic information. The WRes2Net network proposed in 2.3 is used as the backbone network of WRDeeplabv3+ to extract the feature information of water leakage. The channel attention mechanism (CA) is added to the Encoder [35]. CA assigns larger weights to highly responsive channels after feature extraction using depth-wise-separable convolutional layers of different channels. Suppose the high-level semantic information is F c h I W × H × C , F c h = [ F 1 , F 2 , , F C ] , where W and H are the width and height of the input feature image, and C is the channel number. CA is represented as:
f C A ( g c , Φ C A ) = e 1 { f c 2 { r [ f c 1 ( g c , Φ C A 1 ) ] , Φ C A 2 } }
In the formula, g c is the feature map after F c average pooling, Φ C A is the parameter in the channel attention module, e 1 is the Sigmoid activation, f c is the fully connected layer, and r is the ReLU activation. The CA module outputs f C A and weights the feature map to obtain the output feature map:
F c h = F c h · f C A
Compared with the encoding area, the decoding area can obtain the location and edge information of the target information and other features. However, there is a lot of background information, which will affect the segmentation accuracy to a certain extent. The spatial attention mechanism (SA) is introduced in the Decoder. It can focus on the target feature area, adaptively combine high-level features with low-level features, and use high-level features to filter out background information [36]. To obtain global information without increasing parameters, semantic segmentation is improved by global convolutional networks. Two-layer convolution operation is used, and the two-layer convolution kernels are 1 × 5 and 5 × 1 ,   respectively , which are used to obtain key feature information. SA is expressed as:
A 1 = C o n 1 [ C o n 2 ( F c h , Φ S A 1 ) , Φ S A 2 ]
A 2 = C o n 2 [ C o n 1 ( F c h , Φ S A 1 ) , Φ S A 2 ]
f S A ( F c h , Φ S A ) = e 2 A 1 + A 2
In the formula, A 1 is the feature map completed by convolution of convolution kernels 1 × 5 and 5 × 1 , A 2 is the feature map completed by convolution of convolution kernels 5 × 1 and 1 × 5 , C o n 1 is the convolution kernel of 5 × 1 × C , C o n 2 is the convolution kernel of 1 × 5 × C , e 2 is the Sigmoid activation, F c h is obtained by SA weighting, and Φ C A is the parameter in the spatial attention module.
Training a complex deep learning model can take a long time, and the optimizer can improve the training efficiency of the model. At the same time, different optimizers can also improve the performance of the model and achieve better training results. In the training process, the learning-rate decay methods StepLR and CosineAnnealingLR (CosLR) are added, and the two optimizers SGD and Adam are also compared.
In addition, the training is divided into a freezing phase (Freeze) and an unfreezing phase (Unfreeze). At the same time, the Focal Loss function is used to solve the problem of positive and negative sample imbalance [37]. Its formula is shown in (15).
L o s s = { κ ( 1 y ) δ ,                           y = 1 ( 1 κ ) y δ log ( 1 y ) ,       y = 0
where y denotes the output after the activation function, κ denotes the loss weight of the balanced factor-regulated samples, and the sum of the loss weights of all categories is 1. δ denotes the balanced factor-controlled hard and easy sample loss, and δ ≥ 0. When δ = 0, the focal loss function degenerates into the ordinary cross-entropy loss function with κ . When δ increases, the model pays more attention to the hard-to-distinguish samples.

3. Experiment

In this study, the hardware of the experiment is a desktop computer, a mobile visual measurement system. The software is Python 3.7, Pytorch 1.10.2 and Labelme. Computer configuration as CPU: AMD Ryzen 7 5800X with Radeon Graphics 4.6 GHz. GPU: NVIDIA GeForce RTX 3060 6 GB. RAM: 16 GB. The mobile vision measurement system is placed on the left and right sides and the top of the subway locomotive. The train takes a circular photograph of the subway tunnels to obtain water leakage data. The mobile vision-measurement system includes CCD industrial cameras, LED lighting equipment, power supplies, the odometer, the inertial navigation system, the communication system, etc. The CCD industrial camera parameters are as follows: pixels, 3.5 million; pixel size, 3.75 μm × 3.75 μm; target size, 1/3″; and frame rate: 400~5250 fps.

3.1. Dataset

The experimental dataset consists of two parts, as shown in Table 2. The first part (a) is the water leakage dataset of Shanghai subway tunnels in China [38]. The second part (b) is the dataset of Beijing subway tunnels in China collected by our mobile vision-measurement system.
After data screening, the experimental dataset consists of 6000 images, including the subway tunnel environment, different types of water leakage, and other diseases. The size of each image is 512 × 512. The dataset is divided into the Train dataset and the Test dataset at a ratio of 4:1.
In order to compare the segmentation effect of automatic synthetic labels and manual labels, the data are divided into two groups (as shown in Table 3). The group A is automatic synthetic labels (AL-Mask images). The group B is artificial labels (Ground truth). Ground truth is labeled with Labelme and is saved in the JSON format.

3.2. Experimental Scheme

Firstly, we demonstrate the excellent segmentation ability of APS in weakly supervised learning and the method’s effectiveness in automatically generating water leakage labels. First, the original image dataset is segmented into image-level labels via APS. As the subway tunnels mostly have a gray and white background, it forms a strong color contrast with the water leakage. The images are converted to grayscale images, the water leakage area is dark in the grayscale images, and the grayscale value is low. Through a large number of analyses and statistics of the gray value distribution of the water leakage area, it is found that the gray value of the water leakage area is greater than 60 and less than 130, while the background area is greater than 130 [39,40]. Through global threshold segmentation, the background information with a gray value greater than 130 is filtered out and then segmented and compared with images of other image segmentation algorithms. So a is chosen as 60, and b is chosen as 130. Second, image-level labels for multi-class training are provided via WRes2Net. CAM label files are generated for fusion with image-level labels. Redundant background information is removed, and AL-Mask images are generated and compared with the ground truth. After many experiments, the fusion weights α and β are taken as 0.8 and 0.2, respectively.
Secondly, the robustness of fully supervised learning WRDeepLabV3+ is demonstrated. Deep learning training on AL-Mask images is carried out using WRDeepLabV3+. Due to the complex environment of subway tunnels and different noise information from water leakage, the water leakage data are divided into five categories, and the semantic segmentation effect of WRDeepLabV3+ is displayed through different categories.
Finally, the advanced nature and accuracy of the HSL approach are demonstrated. The performance of the approach combined with WL-CAM and WRDeepLabV3+ is compared with other advanced semantic segmentation networks, including SC-CAM [22], U-Net [41], PSPNet [42], HRNet [43], DeepLabV3+ [44], and WRDeepLabV3+. Because of the complex environment of ground tunnels, water leakage has the following three characteristics [45]. First, due to the lateral extrusion and gravity of the lining, as well as the influence of different lining gaps, the water leakage will occur on the lining surface, forming strip water leakage (including horizontal and vertical directions) and blocking water leakage. Second, there will be oil stains and artificial marks on the surface of the inner lining, which will be similar to the color and shape of the water leakage, affecting the segmentation accuracy. Third, the water leakage area will be covered by meter boxes, cables, pipes, etc., causing interference. The water leakage data are divided into five categories, using the HSL approach proposed in this paper and other end-to-end semantic segmentation networks including EM [46], CRF-RNN [47], 1Stage [48], and AA&LR [49] for semantic segmentation comparison is also performed on the dataset.
The experiments are comprehensively evaluated by using Precision, Recall, IoU, MIoU, and F1 to judge the performance of the model by researchers [50].
P r e c i s i o n = T P T P + F P
R e c a l l = T P T P + F N
I o U = T P T P + F P + F N
M I o U = ( T P T P + F P + F N + T N T N + F N + F P ) / 2
F 1 = 2 × T P 2 × T P + F P + F N
where TP: True Positive; FP: False Positive; TN: True Negative; FN: False Negative.

3.3. Experimental Results

The segmentation results of APS are compared with several other image-segmentation algorithms (as shown in Table 4). K-means and Otus have more background information than global threshold segmentation (threshold is 90). For the water leakage area, the first three segmentation algorithms all have the phenomenon of incomplete segmentation, and independent pixels appear. Compared with the three segmentation algorithms, APS can better segment the water leakage area. The background information is effectively removed while retaining the water leakage area.
A comparison of the labeling effect of ground truths and AL-Mask images is shown in Table 5. The first column shows the original dataset of water leakage. The second column shows the water leakage dataset (ground truth) manually marked by Labelme. The third column shows the image-level water leakage dataset obtained by APS. The fourth column shows the mask data, and the fifth column shows the final pixel-level water leakage dataset (AL-Mask images). Compared with ground truth, the edge of the AL-Mask images is not smooth, but it can effectively extract the water leakage area.
The subway tunnel’s surface environment is complex, and there are various types of noise, such as the occlusion of objects and the influence of artificial or oil stains, which cause certain difficulties in the segmentation of water leakage. As shown in Table 6, five typical noise images of water leakage of subway tunnels are listed, as well as their corresponding ground truth, WL-CAM, and WRDeepLabV3+ renderings. Under the complex background, WL-CAM has over-segmentation and incomplete segmentation, while WRDeepLabV3+ obtains a better segmentation effect.
As shown in Table 7, the HSL approach proposed in this paper is compared with the current advanced semantic segmentation network. Since WL-CAM adopts two-stage training, a larger learning rate and batch size are used in the freezing stage, and a smaller learning rate and batch size are used in the thawing stage, which greatly shortens the training time. Compared with SC-CAM, WL-CAM is more time-saving, and all indicators are better than SC-CAM. In fully supervised learning, PSPNet has the highest efficiency, but the accuracy is sacrificed. WRDeepLabV3+ based on WRes2Net101 optimizes the learning rate and adds an optimizer so that the first five evaluation indicators are ahead of other fully supervised networks. It can be seen that the HSL approach proposed in this paper can achieve comparable or even better performance than the existing automatic annotation method SC-CAM. Its segmentation accuracy can also be comparable to the full supervised network. Since the HSL approach includes multiple steps, other models only calculate the model training time, so the time used by this approach is not comparable.
As shown in Table 8, the segmentation effects of the HSL approach and the end-to-end semantic segmentation network under different water leakage types are compared. It can be seen from this that, compared with block water leakage, EM, 1Stage, and AA&LR have better segmentation results. They can identify small water leakage blocks. While the approach in this paper can more completely segment small water leakage blocks. For vertical water leakage, the EM segmentation is incomplete, and the CRF-RNN is over-fitting. For horizontal water leakage, the EM has the problem of excessive segmentation. For water leakage with stain interference, redundant background information appears in the EM and 1Stage segmentation regions. However, in the first four approaches, the outline of the segmentation area is lacking or overfitting. The approach in this paper segmented the region with a more refined and complete outline. For the occluded water leakage, when the occluded part cannot be known, the five approaches all show good results, but EM still has the problem of incomplete segmentation.
As can be seen from Table 9, the MIoU of the approach proposed in this paper outperforms other end-to-end semantic segmentation networks on both the Val dataset and the Test dataset, reaching 81.7% and 82.8%, respectively. HSL is also the most efficient, 25% higher than the second.

4. Analysis and Discussion

This section mainly discusses in detail the advanced nature of the data label automatic labeling method Water Leakage-CAM and the water leakage semantic segmentation network WRDeepLabV3+ included in the HSL approach.

4.1. Performance Evaluation of Water Leakage—CAM

In WL-CAM, the quality of image-level labels depends on APS, and the quality of the pixel-level label depends on the feature extraction network WRes2Net.
WRes2Net is crucial for pixel-level labeling, which is related to the quality of the dataset and the accuracy of the subsequent training of fully supervised segmentation models. This paper chooses to improve based on Res2Net101, which has a deeper feature extraction convolution layer and stronger multi-scale convolution ability. Through the improvement of this model, WRes2Net101 is obtained. In order to compare the performance impact of the two segmentation models on WL-CAM, they are both trained on the same dataset and judged with the same criteria. The MIoU of WL-CAM based on WRes2Net101 is 6.3% higher than that of WL-CAM based on Res2Net101. The overall segmentation accuracy and performance are better than those of WL-CAM based on Res2Net101, as shown in Table 10 and Figure 7.

4.2. Performance Evaluation of WRDeepLabV3+ Semantic Segmentation Network

WRDeepLabV3+ uses the WRes2Net101 structure as the core network with an Epoch of 50 in the freezing phase and 50 in the thawing phase. Firstly, the effect of the optimizer on WRDeepLabV3+ (without attention mechanism) under different learning rate decay methods is compared. As shown in Figure 8, the initial learning rate is 0.0005, the green line is the MIoU of Adam under CosLR, the blue line is the MIoU of Adam under StepLR, the yellow line is the MIoU of SGD under CosLR, and the orange line is the MIoU of SGD under StepLR. Under different learning-rate-decay methods, the MIoU of the optimizer under CosLR is higher than that of StepLR. Experiments show that the proposed model converges faster and can obtain the best segmentation effect when using Adam optimizer and CosLR decay. Figure 9 shows that the initial learning rate is 0.0005 using Adam optimizer and CosLR decay. Compared with the training model of WRDeepLabV3+ and DeepLabV3+, MIoU is improved by 3.3%.
Secondly, we compare WRDeepLabV3+ without and with the attention mechanism. As shown in Table 11, under the same dataset, due to the addition of channel attention mechanism and spatial attention mechanism, Loss is reduced by 7.5%, Recall is improved by 2.4%, Precision is improved by 3%, and IoU is improved by 2.2%.
Finally, the effect of transfer learning on the performance of WRDeepLabV3+ is discussed. The transfer learning of the backbone network model is to use the parameters of the trained model (pre-trained model) to transfer to the new model to help the training of the new model. As shown in Figure 10, it allows the network to improve faster during training. After training, the convergence effect and performance of the model are better.

5. Conclusions

In this paper, an HSL approach for the automatic segmentation of water leakage images is proposed. The water-leakage labels are automatically generated by WL-CAM based on the weakly supervised method. The water leakage is semantically segmented based on the fully supervised method WRDeepLabV3+.
In WL-CAM, according to the characteristics of water leakage, an adaptive APS algorithm is proposed, which can accurately and completely segment the water leakage area and generate image-level labels. Secondly, a weakly supervised network WRes2Net is proposed to generate CAM labels. The CAM label files overlap with image-level labels and pixel values and are fused to generate pixel-level labels, which further improves label accuracy and saves manual labeling costs.
In WRDeepLabV3+, the WRes2Net in WL-CAM is used as the core network of WRDeepLabV3+. WRDeepLabV3+ based on a fully supervised network is proposed, and the parameters and framework are adjusted to improve the performance. The excellent robustness of WRDeepLabV3+ is verified by semantic segmentation of water leakage images with different complex noise information.
WL-CAM and WRDeepLabV3+ were tested and compared with other state-of-the-art semantic segmentation methods on the proposed dataset. The results show that WL-CAM has better performance of automatically generating labels than other methods, with a difference of up to 12.1% in the evaluation indicators. All performance indicators of WRDeepLabV3+ are also ahead of other advanced fully supervised methods.
The HSL automatic segmentation approach for water leakage images performs on par with WRDeepLabV3+ and the manual labeling dataset, validating the feasibility and accuracy of WL-CAM. Due to the excellent performance of WL-CAM and WRDeepLabV3+, the hybrid-supervision approach achieves an MIoU of 82.8% on the dataset, which is 6.4% higher than the second end-to-end approach. The efficiency is 25% higher than that of the second. In a complex subway tunnel environment, it can accurately segment the water leakage area.
In the future, more images of water leakage in subway tunnels will be collected as the training dataset to improve the segmentation accuracy and performance of the HSL approach. At the same time, the weakly supervised learning method will be optimized to reduce the complexity of the method and save the training time of the dataset.

Author Contributions

D.Q.: the conception and design of the work, data analysis, problem modeling. H.L.: data acquisition, problem modeling. Z.W.:data analysis, problem modeling. Y.T.: data acquisition, problem modeling. S.W.: the conception and design of the work, methodology, problem modeling, funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

The National Natural Science Foundation of China (No.61902016), the postgraduate education and teaching quality improvement project of BUCEA, China (No.J2022005), the BUCEA Post Graduate Innovation Project (No.PG2022118).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Fang, Q.; Wang, G.; Du, J.M.; Liu, Y.; Zhou, M.Z. Prediction of tunnelling induced ground movement in clay using principle of minimum total potential energy. Tunn. Undergr. Sp. Technol. 2023, 131, 104854. [Google Scholar] [CrossRef]
  2. Zheng, H.B.; Li, P.F.; Ma, G.W.; Zhang, Q.B. Experimental investigation of mechanical characteristics for linings of twins tunnels with asymmetric cross-section. Tunn. Undergr. Sp. Technol. 2022, 119, 104209. [Google Scholar] [CrossRef]
  3. Vermeij, D. Flood Risk Reduction Interventions for the New York City Subway System: A Research on the Impact of Storm Surge and Sea Level Rise on the Safety Against Flooding in Urban Delta’s. Master’s Thesis, TU Delf, Delf, The Netherlands, 2016. [Google Scholar]
  4. Liu, Y.J. Research on structural safety and driving dynamic characteristics of Beijing subway shield tunnel under disease. J. Beijing Jiaotong Univ. 2019, 1–122. [Google Scholar]
  5. Yao, Y.; Tung, E.; Glisic, B. Crack detection and characterization techniques—An overview. Struct. Control Health. Monit. 2014, 21, 1387–1413. [Google Scholar] [CrossRef]
  6. Huang, H.; Sun, Y.; Xue, Y. Research progress of machine vision-based disease detecting techniques for the tunnel lining surface. Mod. Tunn. Technol. 2014, 51, 19–31. [Google Scholar]
  7. Xue, Y.; Li, Y. A method of disease recognition for shield tunnel lining based on deep learning. J. Hunan Univ. 2018, 45, 100–109. [Google Scholar]
  8. Qiu, D.W.; Li, S.F.; Wang, T.; Ye, Q.; Li, R.J.; Ding, K.L.; Xu, H. A high-precision calibration approach for Camera-IMU pose parameters with adaptive constraints of multiple error equations. Measurement 2020, 153, 107402. [Google Scholar] [CrossRef]
  9. Hayward, S.J.; Lopik, K.; Hinde, C.; West, A.A. A Survey of Indoor Location Technologies, Techniques and Applications in Industry. Internet Things 2022, 20, 100608. [Google Scholar] [CrossRef]
  10. Wang, X.; Wu, Y.; Cui, J.; Zhu, C.Q.; Wang, X.Z. Shape characteristics of coral sand from South China Sea. J. Mar. Sci. Eng. 2020, 8, 803. [Google Scholar] [CrossRef]
  11. Shen, J.H.; Wang, X.; Liu, W.B.; Zhang, P.Y.; Zhu, C.Q.; Wang, X.Z. Experimental study on mesoscopic shear behavior of calcareous sand material with digital imaging approach. Adv. Civ. Eng. 2020, 2020, 8881264. [Google Scholar] [CrossRef]
  12. Ren, Y.P.; Huang, J.S.; Hong, Z.Y.; Lu, W.; Yin, J.; Zou, L.J.; Shen, X.H. Image-based concrete crack detection in tunnels using deep fully convolutional networks. Constr. Build. Mater. 2020, 234, 117367–117379. [Google Scholar] [CrossRef]
  13. Song, Q.; Wu, Y.Q.; Xin, X.S.; Yang, L.; Yang, M.; Chen, H.M.; Liu, C.; Hu, M.J.; Chai, X.S.; Li, J.C. Real-time tunnel crack analysis system via deep learning. IEEE Access 2019, 7, 64186–64197. [Google Scholar] [CrossRef]
  14. Xue, Y.D.; Li, Y.C. A fast detection method via region-based fully convolutional neural networks for shield tunnel lining defects. Comput-Aided. Civ. Inf. 2018, 33, 638–654. [Google Scholar] [CrossRef]
  15. Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object detection via region-based fully convolutional networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 2–10 December 2016; pp. 379–387. [Google Scholar]
  16. Huang, H.W.; Li, Q.T.; Zhang, D.M. Deep learning-based image recognition for crack and leakage defects of metro shield tunnel. Tunn. Undergr. Sp. Technol. 2018, 77, 166–176. [Google Scholar] [CrossRef]
  17. Garcia-Garcia, A.; Orts-Escolano, S.; Oprea, S.; Villena-Martinez, V.; Garcia-Rodriguez, J. A review on deep learning techniques applied to semantic segmentation. arXiv 2017, arXiv:1704.06857. [Google Scholar]
  18. Han, L.; Chen, J.F.; Li, H.B.; Liu, G.S.; Leng, B.; Ahmed, A. Multispectral water leakage detection based on a one-stage anchor-free modality fusion network for metro tunnels. Automat. Constr. 2022, 140, 104345. [Google Scholar] [CrossRef]
  19. Xiong, L.; Zhang, D.; Zhang, Y. Water leakage image recognition of shield tunnel via learning deep feature representation. J. Vis. Commun. Image Represent. 2020, 71, 102708. [Google Scholar] [CrossRef]
  20. Dong, Z.M.; Wang, J.J.; Cui, B.; Wang, D.; Wang, X.L. Patch-based weakly supervised semantic segmentation network for crack detection. Constr. Build. Mater. 2022, 258, 120291–120305. [Google Scholar] [CrossRef]
  21. Zhu, J.S.; Song, J.B. Weakly supervised network based intelligent identification of cracks in asphalt concrete bridge deck. Alex. Eng. J. 2020, 59, 1307–1317. [Google Scholar] [CrossRef]
  22. Chang, Y.T.; Wang, Q.; Hung, W.C.; Piramuthu, R.; Tsai, Y.H.; Yang, M.H. Weakly-supervised semantic segmentation via sub-category exploration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8991–9000. [Google Scholar]
  23. Zhao, S.; Zhang, D.M.; Huang, H.W. Deep learning–based image instance segmentation for moisture marks of shield tunnel lining. Tunn. Undergr. Sp. Technol. 2020, 95, 103156. [Google Scholar] [CrossRef]
  24. Wang, H.; Li, Y.; Dang, L.M.; Lee, S.; Moon, H. Pixel-level tunnel crack segmentation using a weakly supervised annotation approach. Comput. Ind. 2021, 133, 103545. [Google Scholar] [CrossRef]
  25. Zhang, A.; Wang, K.C.P.; Fei, Y.; Liu, Y.; Chen, C.; Yang, G.; Li, J.Q.; Yang, E.; Qiu, S. Automated pixel-level pavement crack detection on 3D asphalt surfaces with a re-current neural network. Comput.-Aided. Civ. Inf. 2019, 34, 213–229. [Google Scholar] [CrossRef]
  26. Li, S.; Zhao, X.; Zhou, G. Automatic pixel-level multiple damage detection of concrete structure using fully convolutional network. Comput.-Aided. Civ. Inf. 2019, 34, 616–634. [Google Scholar] [CrossRef]
  27. Dung, C.V.; Anh, L.D. Autonomous concrete crack detection using deep fully convolutional neural network. Autom. Automat. Constr. 2019, 99, 52–58. [Google Scholar] [CrossRef]
  28. Yang, X.; Li, H.; Yu, Y.; Luo, X.; Huang, T.; Yang, X. Automatic pixel-level crack detection and measurement using fully convolutional network. Comput-Aided. Civ. Inf. 2018, 33, 1090–1109. [Google Scholar] [CrossRef]
  29. Bang, S.; Park, S.; Kim, H. Encoder-decoder network for pixel-level road crack detection in black-box images. Comput.-Aided. Civ. Inf. 2019, 34, 713–727. [Google Scholar] [CrossRef]
  30. Liu, Z.; Cao, Y.; Wang, Y.; Wang, W. Computer vision-based concrete crack detection using U-net fully convolutional networks. Automat. Constr. 2019, 104, 129–139. [Google Scholar] [CrossRef]
  31. Božič, J.; Tabernik, D.; Skočaj, D. Mixed supervision for surface- defect detection: From weakly to fully supervised learning. Comput. Ind. 2021, 129, 103459–103470. [Google Scholar] [CrossRef]
  32. Chen, G.Y.; Chen, P.F.; Shi, Y.J.; Hsieh, C.Y.; Liao, B.B.; Zhang, S.Y. Rethinking the usage of batch normalization and dropout in the training of deep neural networks. arXiv 2019, arXiv:1905.05928v1. [Google Scholar]
  33. Dang, L.M.; Kyeong, S.; Li, Y.F.; Wang, H.X.; Nguyen, N.T.; Moon, H. Deep learning-based sewer defect classification for highly imbalanced dataset. Comput. Ind. Eng. 2021, 161, 107630–107646. [Google Scholar] [CrossRef]
  34. Misra, D. Mish: A Self Regularized Non-Monotonic Activation Function. arXiv 2019, arXiv:1908.08681. [Google Scholar]
  35. Zhao, T.; Wu, X.Q. Pyramid feature attention network for saliency detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3080–3089. [Google Scholar]
  36. Peng, C.; Zhang, X.Y.; Yu, G.; Luo, G.M.; Sun, J. Large kernel matters: Improve semantic segmentation by global convolutional network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1743–1751. [Google Scholar]
  37. Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 42, 2999–3007. [Google Scholar]
  38. Xue, Y.D.; Cai, X.Y.; Shadabfar, M.; Shao, H.; Zhang, S. Deep learning-based automatic recognition of water leakage area in shield tunnel lining. Tunn. Undergr. Sp. Technol. 2020, 104, 103524. [Google Scholar] [CrossRef]
  39. Zheng, J.F.; Gao, Y.C.; Zhang, H.; Lei, Y.; Zhang, J. OTSU Multi-Threshold Image Segmentation Based on Improved Particle Swarm Algorithm. Appl. Sci. 2022, 12, 11514. [Google Scholar] [CrossRef]
  40. Wu, Y.Y.; Li, Q. The Algorithm of Watershed Color Image Segmentation Based on Morphological Gradient. Sensors 2022, 22, 8202. [Google Scholar] [CrossRef] [PubMed]
  41. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional net-works for biomedical image segmentation. In Proceedings of the International Conference on Medical image computing and computer-assisted intervention, Copenhagen, Denmark, 1–6 October 2015; pp. 234–241. [Google Scholar]
  42. Zhao, H.S.; Shi, J.P.; Qi, X.J.; Wang, X.G.; Jia, J.Y. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
  43. Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. arXiv 2019, arXiv:1902.09212v1. [Google Scholar]
  44. Chen, L.C.; Zhu, Y.K.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision, Copenhagen, Denmark, 1–6 October 2018; pp. 833–851. [Google Scholar]
  45. Dawood, T.; Zhu, Z.H.; Zayed, T. Computer vision-based model for moisture marks detection and recognition in subway networks. J. Comput. Civ. Eng. 2018, 32, 04017079. [Google Scholar] [CrossRef]
  46. Papandreou, G.; Chen, L.C.; Murphy, K.P.; Yuille, A.L. Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1742–1750. [Google Scholar]
  47. Roy, A.; Todorovic, S. Combining bottom-up, top-down, and smoothness cues for weakly supervised image segmentation. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3529–3538. [Google Scholar]
  48. Araslanov, N.; Roth, S. Single-stage semantic segmentation from image labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4253–4262. [Google Scholar]
  49. Zhang, X.R.; Peng, Z.L.; Zhu, P.; Zhang, T.Y.; Li, C.; Zhou, H.Y.; Jiao, L.C. Adaptive affinity loss and erroneous pseudo-label refinement for weakly supervised semantic segmentation. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event, 20–24 October 2021; pp. 5463–5472. [Google Scholar]
  50. Menon, R.R.; Luo, J.; Chen, X.; Zhou, H.; Liu, Z.; Zhou, G.; Zhang, N.; Jin, C. Screening of Fungi for Potential Application of Self-Healing Concrete. Sci. Rep. 2019, 9, 2075. [Google Scholar] [CrossRef]
Figure 1. The Mobile Vision Measurement System. (a) The real-time detection, (b) The schematic diagram of the system.
Figure 1. The Mobile Vision Measurement System. (a) The real-time detection, (b) The schematic diagram of the system.
Applsci 12 11799 g001
Figure 2. Basic idea of HSL.
Figure 2. Basic idea of HSL.
Applsci 12 11799 g002
Figure 3. The overall framework of the HSL.
Figure 3. The overall framework of the HSL.
Applsci 12 11799 g003
Figure 4. Classification of independent pixels within the cluster. (a) Obtaining the color distance for independent pixels. (b) The classification of independent pixels.
Figure 4. Classification of independent pixels within the cluster. (a) Obtaining the color distance for independent pixels. (b) The classification of independent pixels.
Applsci 12 11799 g004
Figure 5. (a) is the Res2Net module, (b) is the WRes2Net module, (c) is the WRes2Net shortcut Rate is the dilation rate, and Avgpool is the average pooling layer.
Figure 5. (a) is the Res2Net module, (b) is the WRes2Net module, (c) is the WRes2Net shortcut Rate is the dilation rate, and Avgpool is the average pooling layer.
Applsci 12 11799 g005
Figure 6. WRDeeplabv3+ overall model.
Figure 6. WRDeeplabv3+ overall model.
Applsci 12 11799 g006
Figure 7. Loss of WL-CAM with different backbone networks.
Figure 7. Loss of WL-CAM with different backbone networks.
Applsci 12 11799 g007
Figure 8. Effects of different learning rate decay methods on the optimizer.
Figure 8. Effects of different learning rate decay methods on the optimizer.
Applsci 12 11799 g008
Figure 9. Comparison of DeepLabV3+ and WRDeepLabV3+ under CosLR and Adam.
Figure 9. Comparison of DeepLabV3+ and WRDeepLabV3+ under CosLR and Adam.
Applsci 12 11799 g009
Figure 10. The performance effect of transfer learning on WRDeepLabV3+.
Figure 10. The performance effect of transfer learning on WRDeepLabV3+.
Applsci 12 11799 g010
Table 1. Main diseases of subway tunnels.
Table 1. Main diseases of subway tunnels.
DamagedMissing CornerCrackWater Leakage
Applsci 12 11799 i001 Applsci 12 11799 i002 Applsci 12 11799 i003 Applsci 12 11799 i004
Applsci 12 11799 i005 Applsci 12 11799 i006 Applsci 12 11799 i007 Applsci 12 11799 i008
Table 2. The details of the experimental dataset.
Table 2. The details of the experimental dataset.
DatasetsCollection EquipmentNumber of ImagesResolutions
aMTI-200a35551600 × 1200 and 1944 × 2592
bMobile vision measurement system67392560 × 1440
Table 3. Classification of the experimental dataset.
Table 3. Classification of the experimental dataset.
GroupsTrainValTestSum
A4000100010006000
B4000100010006000
Table 4. Comparison of image segmentation algorithms.
Table 4. Comparison of image segmentation algorithms.
EffectsK-MeansOtusThreshold SegmentationAPS
Overall effects Applsci 12 11799 i009 Applsci 12 11799 i010 Applsci 12 11799 i011 Applsci 12 11799 i012
Water leakage details Applsci 12 11799 i013 Applsci 12 11799 i014 Applsci 12 11799 i015 Applsci 12 11799 i016
Background information details Applsci 12 11799 i017 Applsci 12 11799 i018 Applsci 12 11799 i019 Applsci 12 11799 i020
Table 5. Comparison of the manually labeled dataset and the automatically labeled dataset.
Table 5. Comparison of the manually labeled dataset and the automatically labeled dataset.
Original ImagesGround TruthsAPS ImagesMask DataAL-Mask Images
Applsci 12 11799 i021 Applsci 12 11799 i022 Applsci 12 11799 i023 Applsci 12 11799 i024 Applsci 12 11799 i025
Applsci 12 11799 i026 Applsci 12 11799 i027 Applsci 12 11799 i028 Applsci 12 11799 i029 Applsci 12 11799 i030
Applsci 12 11799 i031 Applsci 12 11799 i032 Applsci 12 11799 i033 Applsci 12 11799 i034 Applsci 12 11799 i035
Applsci 12 11799 i036 Applsci 12 11799 i037 Applsci 12 11799 i038 Applsci 12 11799 i039 Applsci 12 11799 i040
Table 6. Water leakage segmentation effects of WL-CAM and WRDeepLabV3+ on different noises.
Table 6. Water leakage segmentation effects of WL-CAM and WRDeepLabV3+ on different noises.
Noise InformationIron TubesCablesIron WiresElectricity MetersStains
The original images Applsci 12 11799 i041 Applsci 12 11799 i042 Applsci 12 11799 i043 Applsci 12 11799 i044 Applsci 12 11799 i045
Ground truths Applsci 12 11799 i046 Applsci 12 11799 i047 Applsci 12 11799 i048 Applsci 12 11799 i049 Applsci 12 11799 i050
WL-CAM Applsci 12 11799 i051 Applsci 12 11799 i052 Applsci 12 11799 i053 Applsci 12 11799 i054 Applsci 12 11799 i055
WRDeepLabV3+ Applsci 12 11799 i056 Applsci 12 11799 i057 Applsci 12 11799 i058 Applsci 12 11799 i059 Applsci 12 11799 i060
Table 7. Comparison of evaluation results of different semantic segmentation under the proposed dataset.
Table 7. Comparison of evaluation results of different semantic segmentation under the proposed dataset.
Method CategoriesMethodsLossMIoU (%)Recall (%)F1 (%)Precision (%)EpochTraining TimeTesting Time
Weakly supervised learningSC-CAM0.15469.163.570.278.75018 h 58 m 07 s0.087s/image
WL-CAM0.13273.675.679.583.95015 h 21m 16 s0.061 s/image
Fully supervised learningPSPNet0.09680.482.880.177.610011 h 04 m 46 s0.028 s/image
U-Net0.07782.083.881.980.110017 h 55 m 59 s0.074 s/image
HRNet0.08981.083.380.878.610019 h 57 m 58 s0.095 s/image
DeepLabV3+0.07981.885.281.878.710011 h 53 m 34 s0.033 s/image
WRDeepLabV3+0.05386.288.686.784.910013 h 22 m 33 s0.035 s/image
HSL (proposed)0.08482.885.282.980.6100
Table 8. Comparison of segmentation effects of HSL and the end-to-end semantic segmentation network under different water leakage types.
Table 8. Comparison of segmentation effects of HSL and the end-to-end semantic segmentation network under different water leakage types.
BlocksVertical StripsHorizontal StripsStainsOcclusions
The original images Applsci 12 11799 i061 Applsci 12 11799 i062 Applsci 12 11799 i063 Applsci 12 11799 i064 Applsci 12 11799 i065
EM Applsci 12 11799 i066 Applsci 12 11799 i067 Applsci 12 11799 i068 Applsci 12 11799 i069 Applsci 12 11799 i070
CRF-RNN Applsci 12 11799 i071 Applsci 12 11799 i072 Applsci 12 11799 i073 Applsci 12 11799 i074 Applsci 12 11799 i075
1Stage Applsci 12 11799 i076 Applsci 12 11799 i077 Applsci 12 11799 i078 Applsci 12 11799 i079 Applsci 12 11799 i080
AA&LR Applsci 12 11799 i081 Applsci 12 11799 i082 Applsci 12 11799 i083 Applsci 12 11799 i084 Applsci 12 11799 i085
HSL Applsci 12 11799 i086 Applsci 12 11799 i087 Applsci 12 11799 i088 Applsci 12 11799 i089 Applsci 12 11799 i090
Table 9. MIoU and the efficiency comparison between the HSL and the end-to-end semantic segmentation network on the proposed dataset.
Table 9. MIoU and the efficiency comparison between the HSL and the end-to-end semantic segmentation network on the proposed dataset.
ApproachesBackboneVal (%)Test (%)Testing time
EMVGG1658.459.80.092 s/image
CRF-RNNVGG1661.662.90.068 s/image
1StageWideResNet3873.474.10.044 s/image
AA&LRWideResNet3875.676.40.052 s/image
HSLWRes2Net10181.782.80.035 s/image
Table 10. Performance evaluation of WL-CAM with different backbone networks based on the proposed dataset.
Table 10. Performance evaluation of WL-CAM with different backbone networks based on the proposed dataset.
MethodsPrecision (%)Recall (%)MIoU (%)F1 (%)
WL-CAM
(Res2Net101)
71.474.367.372.8
WL-CAM
(WRes2Net101)
83.975.673.679.5
Table 11. Comparison of different WRDeepLabV3+ under the proposed dataset.
Table 11. Comparison of different WRDeepLabV3+ under the proposed dataset.
WRDeepLabV3+LossRecall (%)Precision (%)IoU (%)
No attention mechanism0.05786.281.975.4
Add attention mechanism0.05388.684.977.6
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Qiu, D.; Liang, H.; Wang, Z.; Tong, Y.; Wan, S. Hybrid-Supervised-Learning-Based Automatic Image Segmentation for Water Leakage in Subway Tunnels. Appl. Sci. 2022, 12, 11799. https://doi.org/10.3390/app122211799

AMA Style

Qiu D, Liang H, Wang Z, Tong Y, Wan S. Hybrid-Supervised-Learning-Based Automatic Image Segmentation for Water Leakage in Subway Tunnels. Applied Sciences. 2022; 12(22):11799. https://doi.org/10.3390/app122211799

Chicago/Turabian Style

Qiu, Dongwei, Haorong Liang, Zhilin Wang, Yuci Tong, and Shanshan Wan. 2022. "Hybrid-Supervised-Learning-Based Automatic Image Segmentation for Water Leakage in Subway Tunnels" Applied Sciences 12, no. 22: 11799. https://doi.org/10.3390/app122211799

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop