Hybrid-Supervised-Learning-Based Automatic Image Segmentation for Water Leakage in Subway Tunnels

Qiu, Dongwei; Liang, Haorong; Wang, Zhilin; Tong, Yuci; Wan, Shanshan

doi:10.3390/app122211799

Open AccessArticle

Hybrid-Supervised-Learning-Based Automatic Image Segmentation for Water Leakage in Subway Tunnels

¹

School of Geomatics and Urban Spatial Informatics, Beijing University of Civil Engineering and Architecture, Beijing 100044, China

²

School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(22), 11799; https://doi.org/10.3390/app122211799

Submission received: 3 November 2022 / Revised: 15 November 2022 / Accepted: 18 November 2022 / Published: 20 November 2022

(This article belongs to the Special Issue Urban Underground Engineering: Excavation, Monitoring, and Control)

Download

Browse Figures

Versions Notes

Abstract

:

Quickly and accurately identifying water leakage is one of the important components of the health monitoring of subway tunnels. A mobile vision measurement system consisting of several high-resolution, industrial, charge-coupled device (CCD) cameras is placed on trains to implement structural health monitoring in tunnels. Through the image processing technology proposed in this paper, water leakage areas in subway tunnels can be found and repaired in real time. A lightweight automatic segmentation approach to water leakage using hybrid-supervised-deep-learning technology is proposed. This approach consists of the weakly supervised learning Water Leakage-CAM and fully supervised learning WRDeepLabV3+. The Water Leakage-CAM is used for the automatic labeling of data. The WRDeepLabV3+ is used for the accurate identification of water leakage areas in subway tunnels. Compared with other end-to-end semantic segmentation networks, the hybrid-supervised learning approach can more completely segment the water leakage region when dealing with water leakage in complex environments. The hybrid-supervised-deep-learning approach proposed in this paper achieves the highest MIoU of 82.8% on the experimental dataset, which is 6.4% higher than the second. The efficiency is also 25% higher than the second and significantly outperforms other end-to-end deep learning approaches.

Keywords:

mobile measurement; vision measurement; subway tunnel; structural health monitoring; deep learning; water leakage

1. Introduction

With the construction and operation of urban subways, various problems related to subway tunnels continue to emerge [1,2]. These have created serious social and economic problems. The main problems associated with subway tunnel diseases are shown in Table 1, in which water leakage is shown to be one of the main problems in subway tunnel diseases. The New York subway tunnels in the United States caused train delays due to water leakage caused by signal failure [3]. Among the 12 subway lines of Beijing in China, there are seven common subway tunnel diseases, including segment water leakage, cracks, misaligned lining, empty track bed, concrete deterioration and section ovalization. Water leakage is one of the main diseases in Beijing subway tunnels [4].

In order to ensure the safe operation of subways, the mobile vision-measurement system is often placed on trains for mobile vision measurement [5]. As shown in Figure 1, 10 high-resolution industrial charge-coupled device (CCD) cameras are placed on the top, left, and right sides and bottom of the train. With 10 LED lighting apparatuses, the subway tunnel was photographed in a ring, and the image data of the entire tunnel was obtained [6,7].

The real-time mileage information of the train is obtained through the odometer, inertial navigation system, communication system, etc., and then the image with position information is obtained [8,9]. Additionally, real-time detection is carried out through a series of image-processing techniques [10,11]. When the water leakage is found, the relevant staff can repair it in time through the location information of the image.

In recent years, the development of deep learning has revolutionized visual measurement. For water leakage detection, semantic segmentation is a better choice. Object detection is divided into non-automatic detection and automatic detection. In non-automatic detection, the approach of fully supervised learning can more accurately segment the water leakage area [12,13]. Xue and Li [14] adopted full convolution (FCN) and Dai et al. [15] local-based FCN (R-FCN) to label, classify, and detect cracks and seepage. Huang et al. [16] adopted FCN Semantic segmentation of leaking water. However, due to its spatial invariance characteristics, small water leakage areas may be ignored [17]. Han et al. [18] proposed a multi-spectral water-leakage detection method that integrates visual optics and thermal infrared and detects the water leakage through single-mode feature extraction and multi-mode feature fusion of FPN, but this method is only performed in laboratory simulations and is not suitable for complex environments. Xiong et al. [19] designed a deep learning system based on an image recognition model to detect water leakage. However, using a VIS color camera is greatly affected by the lighting conditions in the tunnel, and this reduces the detection accuracy. The fully supervised learning approach requires a large number of high-precision dataset labels, so the pre-label production time is long and the efficiency is low. The approach of weakly supervised learning is able to achieve semantic segmentation of objects with few data labels [20]. Zhu et al. [21] used weakly supervised networks to detect and segment different types of cracks and compared the segmentation effects of different networks but did not evaluate the accuracy, precision, and other indicators. Chang et al. [22] proposed a defect segmentation system based on weakly supervised learning, which improves the accuracy of classification and segmentation, but the efficiency is not high, and the problem of time-consuming data labeling has not been solved. Zhao et al. [23] used Mask R-CNN for instance segmentation of water leakage in tunnel linings. The time-consuming problem was solved without overfitting, but there was a problem of low segmentation accuracy. Wang et al. [24] proposed a new network framework as the backbone of the weakly supervised network and added the K-means clustering algorithm to improve the segmentation accuracy. However, due to the limitations of K-means, the image segmentation effect is not ideal in complex environments.

Automatic detection has become the mainstream trend of current research, which can ensure detection accuracy while ensuring efficiency. Zhang et al. [25] proposed a new method for automated pixel-level crack detection on 3D asphalt pavement and called it CrackNet-R. This method can achieve high precision in crack detection. However, it can easily create technical isolation for users in practical applications. Li et al. [26], Dung et al. [27], and Yang et al. [28] applied a fully convolutional network (FCN) to perform automatic pixel-level crack detection. This method has a good effect on large-pixel crack detection but has certain limitations for small cracks, and the effect is not ideal. Bang et al. [29] proposed an encoder–decoder network-based method for the automatic detection of road cracks at the pixel level. Transfer learning is performed using ResNet152, which achieves the best performance among several convolutional neural networks, but the experimental results do not achieve the expected accuracy. Liu et al. [30] propose the use of U-Net for automatic crack detection. U-Net has better robustness than Cha’s CNN method and is more suitable for crack detection. However, due to the complex environment and interference, it will generate redundant identification. Song et al. [13] proposed an improved automatic crack detection framework based on DeepLabV3+ for detection and obtained a speed of 23 frames per second, but the robustness is not strong, only for small data samples. Therefore, automatic object detection is still a challenge for pixel segmentation.

At present, there are still problems such as complex algorithms, long time consumption, and low detection accuracy in automatic detection. This paper proposes a lightweight hybrid supervised learning (HSL) approach that combines efficient weakly supervised learning with high-precision, fully supervised learning. Among them, weakly supervised learning is used for the automatic generation of water leakage pixel-level labels, while fully supervised learning is used for water leakage semantic segmentation. In weakly supervised learning, the Water Leakage-CAM (WL-CAM) method framework is proposed. Through a multi-level fusion strategy, high-precision pixel-level labels are obtained. First, an adaptive pixel segmentation clustering (APS) algorithm is used to generate image-level labels with less background information. Second, the WRes2Net deep learning network is used for training, and the class activation map (CAM) generated by training is used to generate label files and image-level labels for pixel value overlap and fusion, thereby reducing data labeling time and labor costs. In fully supervised learning, this paper improves the DeepLabV3+ network, adding multiple learning rate decay methods, optimizers, and attention mechanisms. It can accurately segment water leakage in various complex environments, and the segmentation accuracy is greatly improved.

2. Approach

The HSL combines the advantages of weakly supervised learning and fully supervised learning. The accuracy of data labels is guaranteed without manual labeling, while the accuracy of fully supervised learning semantic segmentation is guaranteed.

2.1. Basic Idea

Figure 2 shows the basic idea of HSL. As a new approach for identifying water leakage in subway tunnels, it can automatically identify water leakage areas without labels.

We take an unlabeled original dataset as an example to illustrate the implementation process of the HSL approach. The original dataset includes RGB images (

I_{R G B}

) and grayscale images (

I_{G r a y}

). APS generates image-level labels (

I^{A P S}

) in two steps. Firstly, the number K of cluster types is calculated through the gray gradient distribution and gray peak value of

I_{G r a y}

. Secondly, batch clustering is performed using the

I_{R G B}

dataset to obtain

I^{A P S}

. The WRes2Net deep learning network is used to extract the features of the

I^{A P S}

, generate a CAM feature map, and refine the CAM feature map to generate a CAM label (

I_{S}^{M D}

). The

I^{A P S}

and

I_{S}^{M D}

are fused and superimposed. Based on the

I^{A P S}

category, the region with the largest category weight is reserved, and the pixel-level label

I

is obtained. Through WRDeepLabV3+, the feature of water leakage is extracted for

I

, and the water leakage area is segmented.

2.2. Framework

As shown in Figure 3, the HSL approach adopts a two-stage structure, which is divided into a weakly supervised learning WL-CAM and a fully supervised learning WRDeepLabV3+. There are two steps in WL-CAM. Step 1, according to the characteristics of water leakage, is to propose the APS algorithm and the segment to obtain image-level labels. Step2 is to train WRes2Net semantic segmentation network to obtain a CAM [31]. The generated CAM will obtain the label file (Mask Data) through the random-walk algorithm (RW). Image-level tags are combined with it. To further improve the completeness and accuracy of dataset labels, pixel-level labels are obtained. Feature extraction and semantic segmentation of water leakage are carried out on pixel-level labels using WRDeepLabV3+. In WRDeepLabV3+, the main network is WRes2Net, ASSP is a spatial pyramid pooling module with atrous convolution, and transfer learning is performed through pixel-level labels and the original dataset. We use the channel attention mechanism (CA) to extract the high-level semantic information of water leakage, perform average pooling on the high-level semantic information to obtain a feature vector in the direction of the channel. Then use two nonlinear fully connected layers (FC) to obtain the correlation relationship between channels to limit the complexity of the model. Finally Sigmoid is used for channel feature vector normalization. The spatial attention mechanism (SA) is used to focus the water leakage target area to extract the low-level semantic information of the water leakage, the semantic segmentation accuracy is improved through the global convolutional network, and finally, the channel feature vector normalization processing is performed using Sigmoid. Finally, through the combination of high-level semantic information and low-level semantic information, the water leakage area is generated.

2.3. Automatic Generation of Water Leakage Labels for Subway Tunnels

The automatic generation of water leakage labels for subway tunnels adopts the WL-CAM based on weakly supervised learning.

Image-level labels are generated through ASP, which is divided into the following steps:

Calculate the gray value of the image to find K. Suppose an image $I_{g r a y}$ consists of N water basin characteristic areas, $I_{g r a y} = {γ_{n} | γ_{n} \in R^{d}, n = 1, 2, 3, \dots, N}$ , where $γ_{n}$ consists of M data that provide its characteristics. $γ_{n} = {η_{m} | η_{m} \in γ_{n}, m = 1, 2, 3, \dots, M}$ , and $η_{m}$ is the gray value of M pixels in the feature area. Calculate the grayscale histogram. In the grayscale histogram, $U_{ν}^{m a x} = {U_{ν}^{m a x} \in I_{g r a y}, ν = 1, 2, 3, \dots, V}$ , $U_{ν}^{m a x}$ is the peak value of the gray value, and $ν$ is the peak value number.
To further determine the number of clusters K, find the $γ_{n}$ mean $\bar{γ_{n}}$ , and the number of types of K is:

$K = {\begin{matrix} n \bar{γ_{n}} < a \\ v a < \bar{γ_{n}} < b a n d a < U_{ν}^{m a x} < b \end{matrix}$

(1)

$K = n + v$

(2)

where a and b are grayscale values.
To process the RGB of the image, let $I_{r g b}$ consist of N data, $I_{r g b} = {S_{n} | S_{n} \in R^{d}, n = 1, 2, 3, \dots, N}$ . A binary variable $P_{n k} \in {0, 1}$ is introduced to indicate which k cluster $S_{k}$ (k = 1, 2, …, K) any point $S_{n}$ belongs to. When $P_{n k}$ = 1, it means that the data belong to class k; otherwise it is 0. The loss function is thus defined as:

$Ψ = \sum_{n = 1}^{N} \sum_{k = 1}^{K} P_{n k} ‖ S_{n} - S_{k} ‖^{2}$

(3)

$‖ S_{n} - S_{k} ‖ = \sqrt{D_{1}^{2} + D_{2}^{2}}$

(4)

$D_{1} = \sqrt{{(R_{n} - R_{k})}^{2} + {(G_{n} - G_{k})}^{2} + {(B_{n} - B_{k})}^{2}}$

(5)

$D_{2} = \sqrt{{(X_{n} - X_{k})}^{2} + {(Y_{n} - Y_{k})}^{2}}$

(6)

$D_{1}$ is the color distance, $D_{2}$ is the Euclidean space distance, $‖ S_{n} - S_{k} ‖$ is the absolute value of the clustering distance. The APS converges the cluster centers through continuous iteration, and $S_{n}$ and $S_{k}$ can be optimized alternately during the iterative process. When K = 2, calculate the pixel area and keep the largest pixel area.
During clustering, the calculation of spatial distance will affect the classification of pixel points, resulting in the existence of independent pixels within the cluster. The Canny edge extraction algorithm obtains the closed cluster edge to determine the closed cluster. Calculate the color distance between the independent pixels inside the closed cluster and the K cluster centers, and assign it to the cluster area with the smallest color distance, as shown in Figure 4.

After the image-level labels are obtained, deep learning training is performed on the dataset to generate Mask Data and the segmentation performance is mainly affected by the classifier. Compared with Res2Net (as shown in Figure 5), WRes2Net adds atrous convolution, which can obtain more characteristic information about water leakage without changing the scale and can connect more residual blocks to make water leakage. Feature information is more abundant. After each convolution, the feature maps go through activation layers and normalization [32,33]. To make the network have better accuracy and stability, the activation function ReLU is replaced by the Mish function [34]. The non-negative effect of the activation function can make the weight layer update less ideal, placing the activation layer before the normalization layer [33].

In CAM, the higher the eigenvalue of the region concerned by the model, the darker the color, and the higher the weight of the representation set:

I_{S}^{M D} = \sum_{C} a_{S}^{C} F_{C}

(7)

where

I_{S}^{M D}

is the Mask Data image,

S

is the label type, C is the number of channels,

a_{S}^{C}

is the weight of the C-th channel type t, and

F_{C}

is the feature map of the C-th channel. Expand the weight

a_{S}^{C}

to obtain:

a_{S}^{C} = \frac{1}{L} \sum_{i} \sum_{j} \frac{\partial ζ^{S}}{\partial F_{C}^{i, j}}

(8)

where L is the width and height of the feature map,

ζ^{S}

is the probability of outputting the target type, and

F_{C}^{i, j}

represents the pixel value at

(i, j)

in the C-th feature map.

Since the weakly supervised semantic segmentation network only detects the main features of different categories in the image, it is difficult to obtain a complete object response map. Combining Mask Data with image-level labels:

I = α I^{A P S} + β I_{S}^{M D} + ο

(9)

I

is the pixel-level label,

I^{A P S}

is the image-level label, α and β are the weights, and

ο

is the residual.

Based on

I^{A P S}

,

I^{A P S}

contains n subclass labels, which overlap with

I_{S}^{M D}

, and the subclass label with the highest overlap weight value is the feature target. The images are corrected and supplemented by the residual network in the training network.

2.4. Semantic Segmentation of Water Leakage in Subway Tunnels

As can be seen from Figure 6, WRDeepLabV3+ consists of two modules. The encoding area (Encoder) is used for the extraction of high-level semantic information. The decoding area (Decoder) is used for the extraction of low-level semantic information. The WRes2Net network proposed in 2.3 is used as the backbone network of WRDeeplabv3+ to extract the feature information of water leakage. The channel attention mechanism (CA) is added to the Encoder [35]. CA assigns larger weights to highly responsive channels after feature extraction using depth-wise-separable convolutional layers of different channels. Suppose the high-level semantic information is

F_{c}^{h} \in I^{W \times H \times C}

,

F_{c}^{h} = [F_{1}, F_{2}, \dots, F_{C}]

, where W and H are the width and height of the input feature image, and C is the channel number. CA is represented as:

f_{C A} (g_{c}, Φ_{C A}) = e_{1} {f c_{2} {r [f c_{1} (g_{c}, Φ_{C A 1})], Φ_{C A 2}}}

(10)

In the formula,

g_{c}

is the feature map after

F_{c}^{„}

average pooling,

Φ_{C A}

is the parameter in the channel attention module,

e_{1}

is the Sigmoid activation,

f c

is the fully connected layer, and

r

is the ReLU activation. The CA module outputs

f_{C A}

and weights the feature map to obtain the output feature map:

F_{c}^{h^{'}} = F_{c}^{h} \cdot f_{C A}

(11)

Compared with the encoding area, the decoding area can obtain the location and edge information of the target information and other features. However, there is a lot of background information, which will affect the segmentation accuracy to a certain extent. The spatial attention mechanism (SA) is introduced in the Decoder. It can focus on the target feature area, adaptively combine high-level features with low-level features, and use high-level features to filter out background information [36]. To obtain global information without increasing parameters, semantic segmentation is improved by global convolutional networks. Two-layer convolution operation is used, and the two-layer convolution kernels are

1 \times 5

and

5 \times 1, respectively

, which are used to obtain key feature information. SA is expressed as:

A_{1} = C o n_{1} [C o n_{2} (F_{c}^{h}, Φ_{S A 1}), Φ_{S A 2}]

(12)

A_{2} = C o n_{2} [C o n_{1} (F_{c}^{h}, Φ_{S A 1}), Φ_{S A 2}]

(13)

f_{S A} (F_{c}^{h^{″}}, Φ_{S A}) = e_{2} A_{1} + A_{2}

(14)

In the formula,

A_{1}

is the feature map completed by convolution of convolution kernels

1 \times 5

and

5 \times 1

,

A_{2}

is the feature map completed by convolution of convolution kernels

5 \times 1

and

1 \times 5

,

C o n_{1}

is the convolution kernel of

5 \times 1 \times C

,

C o n_{2}

is the convolution kernel of

1 \times 5 \times C

,

e_{2}

is the Sigmoid activation,

F_{c}^{h^{″}}

is obtained by SA weighting, and

Φ_{C A}

is the parameter in the spatial attention module.

Training a complex deep learning model can take a long time, and the optimizer can improve the training efficiency of the model. At the same time, different optimizers can also improve the performance of the model and achieve better training results. In the training process, the learning-rate decay methods StepLR and CosineAnnealingLR (CosLR) are added, and the two optimizers SGD and Adam are also compared.

In addition, the training is divided into a freezing phase (Freeze) and an unfreezing phase (Unfreeze). At the same time, the Focal Loss function is used to solve the problem of positive and negative sample imbalance [37]. Its formula is shown in (15).

L o s s = {\begin{matrix} - κ {(1 - y^{'})}^{δ}, y = 1 \\ - (1 - κ) y^{'}^{δ} \log (1 - y^{'}), y = 0 \end{matrix}

(15)

where

y^{'}

denotes the output after the activation function,

κ

denotes the loss weight of the balanced factor-regulated samples, and the sum of the loss weights of all categories is 1. δ denotes the balanced factor-controlled hard and easy sample loss, and δ ≥ 0. When δ = 0, the focal loss function degenerates into the ordinary cross-entropy loss function with

κ

. When δ increases, the model pays more attention to the hard-to-distinguish samples.

3. Experiment

In this study, the hardware of the experiment is a desktop computer, a mobile visual measurement system. The software is Python 3.7, Pytorch 1.10.2 and Labelme. Computer configuration as CPU: AMD Ryzen 7 5800X with Radeon Graphics 4.6 GHz. GPU: NVIDIA GeForce RTX 3060 6 GB. RAM: 16 GB. The mobile vision measurement system is placed on the left and right sides and the top of the subway locomotive. The train takes a circular photograph of the subway tunnels to obtain water leakage data. The mobile vision-measurement system includes CCD industrial cameras, LED lighting equipment, power supplies, the odometer, the inertial navigation system, the communication system, etc. The CCD industrial camera parameters are as follows: pixels, 3.5 million; pixel size, 3.75 μm × 3.75 μm; target size, 1/3″; and frame rate: 400~5250 fps.

3.1. Dataset

The experimental dataset consists of two parts, as shown in Table 2. The first part (a) is the water leakage dataset of Shanghai subway tunnels in China [38]. The second part (b) is the dataset of Beijing subway tunnels in China collected by our mobile vision-measurement system.

After data screening, the experimental dataset consists of 6000 images, including the subway tunnel environment, different types of water leakage, and other diseases. The size of each image is 512 × 512. The dataset is divided into the Train dataset and the Test dataset at a ratio of 4:1.

In order to compare the segmentation effect of automatic synthetic labels and manual labels, the data are divided into two groups (as shown in Table 3). The group A is automatic synthetic labels (AL-Mask images). The group B is artificial labels (Ground truth). Ground truth is labeled with Labelme and is saved in the JSON format.

3.2. Experimental Scheme

Firstly, we demonstrate the excellent segmentation ability of APS in weakly supervised learning and the method’s effectiveness in automatically generating water leakage labels. First, the original image dataset is segmented into image-level labels via APS. As the subway tunnels mostly have a gray and white background, it forms a strong color contrast with the water leakage. The images are converted to grayscale images, the water leakage area is dark in the grayscale images, and the grayscale value is low. Through a large number of analyses and statistics of the gray value distribution of the water leakage area, it is found that the gray value of the water leakage area is greater than 60 and less than 130, while the background area is greater than 130 [39,40]. Through global threshold segmentation, the background information with a gray value greater than 130 is filtered out and then segmented and compared with images of other image segmentation algorithms. So a is chosen as 60, and b is chosen as 130. Second, image-level labels for multi-class training are provided via WRes2Net. CAM label files are generated for fusion with image-level labels. Redundant background information is removed, and AL-Mask images are generated and compared with the ground truth. After many experiments, the fusion weights α and β are taken as 0.8 and 0.2, respectively.

Secondly, the robustness of fully supervised learning WRDeepLabV3+ is demonstrated. Deep learning training on AL-Mask images is carried out using WRDeepLabV3+. Due to the complex environment of subway tunnels and different noise information from water leakage, the water leakage data are divided into five categories, and the semantic segmentation effect of WRDeepLabV3+ is displayed through different categories.

Finally, the advanced nature and accuracy of the HSL approach are demonstrated. The performance of the approach combined with WL-CAM and WRDeepLabV3+ is compared with other advanced semantic segmentation networks, including SC-CAM [22], U-Net [41], PSPNet [42], HRNet [43], DeepLabV3+ [44], and WRDeepLabV3+. Because of the complex environment of ground tunnels, water leakage has the following three characteristics [45]. First, due to the lateral extrusion and gravity of the lining, as well as the influence of different lining gaps, the water leakage will occur on the lining surface, forming strip water leakage (including horizontal and vertical directions) and blocking water leakage. Second, there will be oil stains and artificial marks on the surface of the inner lining, which will be similar to the color and shape of the water leakage, affecting the segmentation accuracy. Third, the water leakage area will be covered by meter boxes, cables, pipes, etc., causing interference. The water leakage data are divided into five categories, using the HSL approach proposed in this paper and other end-to-end semantic segmentation networks including EM [46], CRF-RNN [47], 1Stage [48], and AA&LR [49] for semantic segmentation comparison is also performed on the dataset.

The experiments are comprehensively evaluated by using Precision, Recall, IoU, MIoU, and F1 to judge the performance of the model by researchers [50].

P r e c i s i o n = \frac{T P}{T P + F P}

(16)

R e c a l l = \frac{T P}{T P + F N}

(17)

I o U = \frac{T P}{T P + F P + F N}

(18)

M I o U = (\frac{T P}{T P + F P + F N} + \frac{T N}{T N + F N + F P}) / 2

(19)

F 1 = \frac{2 \times T P}{2 \times T P + F P + F N}

(20)

where TP: True Positive; FP: False Positive; TN: True Negative; FN: False Negative.

3.3. Experimental Results

The segmentation results of APS are compared with several other image-segmentation algorithms (as shown in Table 4). K-means and Otus have more background information than global threshold segmentation (threshold is 90). For the water leakage area, the first three segmentation algorithms all have the phenomenon of incomplete segmentation, and independent pixels appear. Compared with the three segmentation algorithms, APS can better segment the water leakage area. The background information is effectively removed while retaining the water leakage area.

A comparison of the labeling effect of ground truths and AL-Mask images is shown in Table 5. The first column shows the original dataset of water leakage. The second column shows the water leakage dataset (ground truth) manually marked by Labelme. The third column shows the image-level water leakage dataset obtained by APS. The fourth column shows the mask data, and the fifth column shows the final pixel-level water leakage dataset (AL-Mask images). Compared with ground truth, the edge of the AL-Mask images is not smooth, but it can effectively extract the water leakage area.

The subway tunnel’s surface environment is complex, and there are various types of noise, such as the occlusion of objects and the influence of artificial or oil stains, which cause certain difficulties in the segmentation of water leakage. As shown in Table 6, five typical noise images of water leakage of subway tunnels are listed, as well as their corresponding ground truth, WL-CAM, and WRDeepLabV3+ renderings. Under the complex background, WL-CAM has over-segmentation and incomplete segmentation, while WRDeepLabV3+ obtains a better segmentation effect.

As shown in Table 7, the HSL approach proposed in this paper is compared with the current advanced semantic segmentation network. Since WL-CAM adopts two-stage training, a larger learning rate and batch size are used in the freezing stage, and a smaller learning rate and batch size are used in the thawing stage, which greatly shortens the training time. Compared with SC-CAM, WL-CAM is more time-saving, and all indicators are better than SC-CAM. In fully supervised learning, PSPNet has the highest efficiency, but the accuracy is sacrificed. WRDeepLabV3+ based on WRes2Net101 optimizes the learning rate and adds an optimizer so that the first five evaluation indicators are ahead of other fully supervised networks. It can be seen that the HSL approach proposed in this paper can achieve comparable or even better performance than the existing automatic annotation method SC-CAM. Its segmentation accuracy can also be comparable to the full supervised network. Since the HSL approach includes multiple steps, other models only calculate the model training time, so the time used by this approach is not comparable.

As shown in Table 8, the segmentation effects of the HSL approach and the end-to-end semantic segmentation network under different water leakage types are compared. It can be seen from this that, compared with block water leakage, EM, 1Stage, and AA&LR have better segmentation results. They can identify small water leakage blocks. While the approach in this paper can more completely segment small water leakage blocks. For vertical water leakage, the EM segmentation is incomplete, and the CRF-RNN is over-fitting. For horizontal water leakage, the EM has the problem of excessive segmentation. For water leakage with stain interference, redundant background information appears in the EM and 1Stage segmentation regions. However, in the first four approaches, the outline of the segmentation area is lacking or overfitting. The approach in this paper segmented the region with a more refined and complete outline. For the occluded water leakage, when the occluded part cannot be known, the five approaches all show good results, but EM still has the problem of incomplete segmentation.

As can be seen from Table 9, the MIoU of the approach proposed in this paper outperforms other end-to-end semantic segmentation networks on both the Val dataset and the Test dataset, reaching 81.7% and 82.8%, respectively. HSL is also the most efficient, 25% higher than the second.

4. Analysis and Discussion

This section mainly discusses in detail the advanced nature of the data label automatic labeling method Water Leakage-CAM and the water leakage semantic segmentation network WRDeepLabV3+ included in the HSL approach.

4.1. Performance Evaluation of Water Leakage—CAM

In WL-CAM, the quality of image-level labels depends on APS, and the quality of the pixel-level label depends on the feature extraction network WRes2Net.

WRes2Net is crucial for pixel-level labeling, which is related to the quality of the dataset and the accuracy of the subsequent training of fully supervised segmentation models. This paper chooses to improve based on Res2Net101, which has a deeper feature extraction convolution layer and stronger multi-scale convolution ability. Through the improvement of this model, WRes2Net101 is obtained. In order to compare the performance impact of the two segmentation models on WL-CAM, they are both trained on the same dataset and judged with the same criteria. The MIoU of WL-CAM based on WRes2Net101 is 6.3% higher than that of WL-CAM based on Res2Net101. The overall segmentation accuracy and performance are better than those of WL-CAM based on Res2Net101, as shown in Table 10 and Figure 7.

4.2. Performance Evaluation of WRDeepLabV3+ Semantic Segmentation Network

WRDeepLabV3+ uses the WRes2Net101 structure as the core network with an Epoch of 50 in the freezing phase and 50 in the thawing phase. Firstly, the effect of the optimizer on WRDeepLabV3+ (without attention mechanism) under different learning rate decay methods is compared. As shown in Figure 8, the initial learning rate is 0.0005, the green line is the MIoU of Adam under CosLR, the blue line is the MIoU of Adam under StepLR, the yellow line is the MIoU of SGD under CosLR, and the orange line is the MIoU of SGD under StepLR. Under different learning-rate-decay methods, the MIoU of the optimizer under CosLR is higher than that of StepLR. Experiments show that the proposed model converges faster and can obtain the best segmentation effect when using Adam optimizer and CosLR decay. Figure 9 shows that the initial learning rate is 0.0005 using Adam optimizer and CosLR decay. Compared with the training model of WRDeepLabV3+ and DeepLabV3+, MIoU is improved by 3.3%.

Secondly, we compare WRDeepLabV3+ without and with the attention mechanism. As shown in Table 11, under the same dataset, due to the addition of channel attention mechanism and spatial attention mechanism, Loss is reduced by 7.5%, Recall is improved by 2.4%, Precision is improved by 3%, and IoU is improved by 2.2%.

Finally, the effect of transfer learning on the performance of WRDeepLabV3+ is discussed. The transfer learning of the backbone network model is to use the parameters of the trained model (pre-trained model) to transfer to the new model to help the training of the new model. As shown in Figure 10, it allows the network to improve faster during training. After training, the convergence effect and performance of the model are better.

5. Conclusions

In this paper, an HSL approach for the automatic segmentation of water leakage images is proposed. The water-leakage labels are automatically generated by WL-CAM based on the weakly supervised method. The water leakage is semantically segmented based on the fully supervised method WRDeepLabV3+.

In WL-CAM, according to the characteristics of water leakage, an adaptive APS algorithm is proposed, which can accurately and completely segment the water leakage area and generate image-level labels. Secondly, a weakly supervised network WRes2Net is proposed to generate CAM labels. The CAM label files overlap with image-level labels and pixel values and are fused to generate pixel-level labels, which further improves label accuracy and saves manual labeling costs.

In WRDeepLabV3+, the WRes2Net in WL-CAM is used as the core network of WRDeepLabV3+. WRDeepLabV3+ based on a fully supervised network is proposed, and the parameters and framework are adjusted to improve the performance. The excellent robustness of WRDeepLabV3+ is verified by semantic segmentation of water leakage images with different complex noise information.

WL-CAM and WRDeepLabV3+ were tested and compared with other state-of-the-art semantic segmentation methods on the proposed dataset. The results show that WL-CAM has better performance of automatically generating labels than other methods, with a difference of up to 12.1% in the evaluation indicators. All performance indicators of WRDeepLabV3+ are also ahead of other advanced fully supervised methods.

The HSL automatic segmentation approach for water leakage images performs on par with WRDeepLabV3+ and the manual labeling dataset, validating the feasibility and accuracy of WL-CAM. Due to the excellent performance of WL-CAM and WRDeepLabV3+, the hybrid-supervision approach achieves an MIoU of 82.8% on the dataset, which is 6.4% higher than the second end-to-end approach. The efficiency is 25% higher than that of the second. In a complex subway tunnel environment, it can accurately segment the water leakage area.

In the future, more images of water leakage in subway tunnels will be collected as the training dataset to improve the segmentation accuracy and performance of the HSL approach. At the same time, the weakly supervised learning method will be optimized to reduce the complexity of the method and save the training time of the dataset.

Author Contributions

D.Q.: the conception and design of the work, data analysis, problem modeling. H.L.: data acquisition, problem modeling. Z.W.:data analysis, problem modeling. Y.T.: data acquisition, problem modeling. S.W.: the conception and design of the work, methodology, problem modeling, funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

The National Natural Science Foundation of China (No.61902016), the postgraduate education and teaching quality improvement project of BUCEA, China (No.J2022005), the BUCEA Post Graduate Innovation Project (No.PG2022118).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Fang, Q.; Wang, G.; Du, J.M.; Liu, Y.; Zhou, M.Z. Prediction of tunnelling induced ground movement in clay using principle of minimum total potential energy. Tunn. Undergr. Sp. Technol. 2023, 131, 104854. [Google Scholar] [CrossRef]
Zheng, H.B.; Li, P.F.; Ma, G.W.; Zhang, Q.B. Experimental investigation of mechanical characteristics for linings of twins tunnels with asymmetric cross-section. Tunn. Undergr. Sp. Technol. 2022, 119, 104209. [Google Scholar] [CrossRef]
Vermeij, D. Flood Risk Reduction Interventions for the New York City Subway System: A Research on the Impact of Storm Surge and Sea Level Rise on the Safety Against Flooding in Urban Delta’s. Master’s Thesis, TU Delf, Delf, The Netherlands, 2016. [Google Scholar]
Liu, Y.J. Research on structural safety and driving dynamic characteristics of Beijing subway shield tunnel under disease. J. Beijing Jiaotong Univ. 2019, 1–122. [Google Scholar]
Yao, Y.; Tung, E.; Glisic, B. Crack detection and characterization techniques—An overview. Struct. Control Health. Monit. 2014, 21, 1387–1413. [Google Scholar] [CrossRef]
Huang, H.; Sun, Y.; Xue, Y. Research progress of machine vision-based disease detecting techniques for the tunnel lining surface. Mod. Tunn. Technol. 2014, 51, 19–31. [Google Scholar]
Xue, Y.; Li, Y. A method of disease recognition for shield tunnel lining based on deep learning. J. Hunan Univ. 2018, 45, 100–109. [Google Scholar]
Qiu, D.W.; Li, S.F.; Wang, T.; Ye, Q.; Li, R.J.; Ding, K.L.; Xu, H. A high-precision calibration approach for Camera-IMU pose parameters with adaptive constraints of multiple error equations. Measurement 2020, 153, 107402. [Google Scholar] [CrossRef]
Hayward, S.J.; Lopik, K.; Hinde, C.; West, A.A. A Survey of Indoor Location Technologies, Techniques and Applications in Industry. Internet Things 2022, 20, 100608. [Google Scholar] [CrossRef]
Wang, X.; Wu, Y.; Cui, J.; Zhu, C.Q.; Wang, X.Z. Shape characteristics of coral sand from South China Sea. J. Mar. Sci. Eng. 2020, 8, 803. [Google Scholar] [CrossRef]
Shen, J.H.; Wang, X.; Liu, W.B.; Zhang, P.Y.; Zhu, C.Q.; Wang, X.Z. Experimental study on mesoscopic shear behavior of calcareous sand material with digital imaging approach. Adv. Civ. Eng. 2020, 2020, 8881264. [Google Scholar] [CrossRef]
Ren, Y.P.; Huang, J.S.; Hong, Z.Y.; Lu, W.; Yin, J.; Zou, L.J.; Shen, X.H. Image-based concrete crack detection in tunnels using deep fully convolutional networks. Constr. Build. Mater. 2020, 234, 117367–117379. [Google Scholar] [CrossRef]
Song, Q.; Wu, Y.Q.; Xin, X.S.; Yang, L.; Yang, M.; Chen, H.M.; Liu, C.; Hu, M.J.; Chai, X.S.; Li, J.C. Real-time tunnel crack analysis system via deep learning. IEEE Access 2019, 7, 64186–64197. [Google Scholar] [CrossRef]
Xue, Y.D.; Li, Y.C. A fast detection method via region-based fully convolutional neural networks for shield tunnel lining defects. Comput-Aided. Civ. Inf. 2018, 33, 638–654. [Google Scholar] [CrossRef]
Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object detection via region-based fully convolutional networks. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 2–10 December 2016; pp. 379–387. [Google Scholar]
Huang, H.W.; Li, Q.T.; Zhang, D.M. Deep learning-based image recognition for crack and leakage defects of metro shield tunnel. Tunn. Undergr. Sp. Technol. 2018, 77, 166–176. [Google Scholar] [CrossRef]
Garcia-Garcia, A.; Orts-Escolano, S.; Oprea, S.; Villena-Martinez, V.; Garcia-Rodriguez, J. A review on deep learning techniques applied to semantic segmentation. arXiv 2017, arXiv:1704.06857. [Google Scholar]
Han, L.; Chen, J.F.; Li, H.B.; Liu, G.S.; Leng, B.; Ahmed, A. Multispectral water leakage detection based on a one-stage anchor-free modality fusion network for metro tunnels. Automat. Constr. 2022, 140, 104345. [Google Scholar] [CrossRef]
Xiong, L.; Zhang, D.; Zhang, Y. Water leakage image recognition of shield tunnel via learning deep feature representation. J. Vis. Commun. Image Represent. 2020, 71, 102708. [Google Scholar] [CrossRef]
Dong, Z.M.; Wang, J.J.; Cui, B.; Wang, D.; Wang, X.L. Patch-based weakly supervised semantic segmentation network for crack detection. Constr. Build. Mater. 2022, 258, 120291–120305. [Google Scholar] [CrossRef]
Zhu, J.S.; Song, J.B. Weakly supervised network based intelligent identification of cracks in asphalt concrete bridge deck. Alex. Eng. J. 2020, 59, 1307–1317. [Google Scholar] [CrossRef]
Chang, Y.T.; Wang, Q.; Hung, W.C.; Piramuthu, R.; Tsai, Y.H.; Yang, M.H. Weakly-supervised semantic segmentation via sub-category exploration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8991–9000. [Google Scholar]
Zhao, S.; Zhang, D.M.; Huang, H.W. Deep learning–based image instance segmentation for moisture marks of shield tunnel lining. Tunn. Undergr. Sp. Technol. 2020, 95, 103156. [Google Scholar] [CrossRef]
Wang, H.; Li, Y.; Dang, L.M.; Lee, S.; Moon, H. Pixel-level tunnel crack segmentation using a weakly supervised annotation approach. Comput. Ind. 2021, 133, 103545. [Google Scholar] [CrossRef]
Zhang, A.; Wang, K.C.P.; Fei, Y.; Liu, Y.; Chen, C.; Yang, G.; Li, J.Q.; Yang, E.; Qiu, S. Automated pixel-level pavement crack detection on 3D asphalt surfaces with a re-current neural network. Comput.-Aided. Civ. Inf. 2019, 34, 213–229. [Google Scholar] [CrossRef]
Li, S.; Zhao, X.; Zhou, G. Automatic pixel-level multiple damage detection of concrete structure using fully convolutional network. Comput.-Aided. Civ. Inf. 2019, 34, 616–634. [Google Scholar] [CrossRef]
Dung, C.V.; Anh, L.D. Autonomous concrete crack detection using deep fully convolutional neural network. Autom. Automat. Constr. 2019, 99, 52–58. [Google Scholar] [CrossRef]
Yang, X.; Li, H.; Yu, Y.; Luo, X.; Huang, T.; Yang, X. Automatic pixel-level crack detection and measurement using fully convolutional network. Comput-Aided. Civ. Inf. 2018, 33, 1090–1109. [Google Scholar] [CrossRef]
Bang, S.; Park, S.; Kim, H. Encoder-decoder network for pixel-level road crack detection in black-box images. Comput.-Aided. Civ. Inf. 2019, 34, 713–727. [Google Scholar] [CrossRef]
Liu, Z.; Cao, Y.; Wang, Y.; Wang, W. Computer vision-based concrete crack detection using U-net fully convolutional networks. Automat. Constr. 2019, 104, 129–139. [Google Scholar] [CrossRef]
Božič, J.; Tabernik, D.; Skočaj, D. Mixed supervision for surface- defect detection: From weakly to fully supervised learning. Comput. Ind. 2021, 129, 103459–103470. [Google Scholar] [CrossRef]
Chen, G.Y.; Chen, P.F.; Shi, Y.J.; Hsieh, C.Y.; Liao, B.B.; Zhang, S.Y. Rethinking the usage of batch normalization and dropout in the training of deep neural networks. arXiv 2019, arXiv:1905.05928v1. [Google Scholar]
Dang, L.M.; Kyeong, S.; Li, Y.F.; Wang, H.X.; Nguyen, N.T.; Moon, H. Deep learning-based sewer defect classification for highly imbalanced dataset. Comput. Ind. Eng. 2021, 161, 107630–107646. [Google Scholar] [CrossRef]
Misra, D. Mish: A Self Regularized Non-Monotonic Activation Function. arXiv 2019, arXiv:1908.08681. [Google Scholar]
Zhao, T.; Wu, X.Q. Pyramid feature attention network for saliency detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3080–3089. [Google Scholar]
Peng, C.; Zhang, X.Y.; Yu, G.; Luo, G.M.; Sun, J. Large kernel matters: Improve semantic segmentation by global convolutional network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1743–1751. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 42, 2999–3007. [Google Scholar]
Xue, Y.D.; Cai, X.Y.; Shadabfar, M.; Shao, H.; Zhang, S. Deep learning-based automatic recognition of water leakage area in shield tunnel lining. Tunn. Undergr. Sp. Technol. 2020, 104, 103524. [Google Scholar] [CrossRef]
Zheng, J.F.; Gao, Y.C.; Zhang, H.; Lei, Y.; Zhang, J. OTSU Multi-Threshold Image Segmentation Based on Improved Particle Swarm Algorithm. Appl. Sci. 2022, 12, 11514. [Google Scholar] [CrossRef]
Wu, Y.Y.; Li, Q. The Algorithm of Watershed Color Image Segmentation Based on Morphological Gradient. Sensors 2022, 22, 8202. [Google Scholar] [CrossRef] [PubMed]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional net-works for biomedical image segmentation. In Proceedings of the International Conference on Medical image computing and computer-assisted intervention, Copenhagen, Denmark, 1–6 October 2015; pp. 234–241. [Google Scholar]
Zhao, H.S.; Shi, J.P.; Qi, X.J.; Wang, X.G.; Jia, J.Y. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. arXiv 2019, arXiv:1902.09212v1. [Google Scholar]
Chen, L.C.; Zhu, Y.K.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision, Copenhagen, Denmark, 1–6 October 2018; pp. 833–851. [Google Scholar]
Dawood, T.; Zhu, Z.H.; Zayed, T. Computer vision-based model for moisture marks detection and recognition in subway networks. J. Comput. Civ. Eng. 2018, 32, 04017079. [Google Scholar] [CrossRef]
Papandreou, G.; Chen, L.C.; Murphy, K.P.; Yuille, A.L. Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1742–1750. [Google Scholar]
Roy, A.; Todorovic, S. Combining bottom-up, top-down, and smoothness cues for weakly supervised image segmentation. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3529–3538. [Google Scholar]
Araslanov, N.; Roth, S. Single-stage semantic segmentation from image labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4253–4262. [Google Scholar]
Zhang, X.R.; Peng, Z.L.; Zhu, P.; Zhang, T.Y.; Li, C.; Zhou, H.Y.; Jiao, L.C. Adaptive affinity loss and erroneous pseudo-label refinement for weakly supervised semantic segmentation. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event, 20–24 October 2021; pp. 5463–5472. [Google Scholar]
Menon, R.R.; Luo, J.; Chen, X.; Zhou, H.; Liu, Z.; Zhou, G.; Zhang, N.; Jin, C. Screening of Fungi for Potential Application of Self-Healing Concrete. Sci. Rep. 2019, 9, 2075. [Google Scholar] [CrossRef]

Figure 1. The Mobile Vision Measurement System. (a) The real-time detection, (b) The schematic diagram of the system.

Figure 2. Basic idea of HSL.

Figure 3. The overall framework of the HSL.

Figure 4. Classification of independent pixels within the cluster. (a) Obtaining the color distance for independent pixels. (b) The classification of independent pixels.

Figure 5. (a) is the Res2Net module, (b) is the WRes2Net module, (c) is the WRes2Net shortcut Rate is the dilation rate, and Avgpool is the average pooling layer.

Figure 6. WRDeeplabv3+ overall model.

Figure 7. Loss of WL-CAM with different backbone networks.

Figure 8. Effects of different learning rate decay methods on the optimizer.

Figure 9. Comparison of DeepLabV3+ and WRDeepLabV3+ under CosLR and Adam.

Figure 10. The performance effect of transfer learning on WRDeepLabV3+.

Table 1. Main diseases of subway tunnels.

Damaged	Missing Corner	Crack	Water Leakage

Table 2. The details of the experimental dataset.

Datasets	Collection Equipment	Number of Images	Resolutions
a	MTI-200a	3555	1600 × 1200 and 1944 × 2592
b	Mobile vision measurement system	6739	2560 × 1440

Table 3. Classification of the experimental dataset.

Groups	Train	Val	Test	Sum
A	4000	1000	1000	6000
B	4000	1000	1000	6000

Table 4. Comparison of image segmentation algorithms.

Effects	K-Means	Otus	Threshold Segmentation	APS
Overall effects
Water leakage details
Background information details

Table 5. Comparison of the manually labeled dataset and the automatically labeled dataset.

Original Images	Ground Truths	APS Images	Mask Data	AL-Mask Images

Table 6. Water leakage segmentation effects of WL-CAM and WRDeepLabV3+ on different noises.

Noise Information	Iron Tubes	Cables	Iron Wires	Electricity Meters	Stains
The original images
Ground truths
WL-CAM
WRDeepLabV3+

Table 7. Comparison of evaluation results of different semantic segmentation under the proposed dataset.

Method Categories	Methods	Loss	MIoU (%)	Recall (%)	F1 (%)	Precision (%)	Epoch	Training Time	Testing Time
Weakly supervised learning	SC-CAM	0.154	69.1	63.5	70.2	78.7	50	18 h 58 m 07 s	0.087s/image
Weakly supervised learning	WL-CAM	0.132	73.6	75.6	79.5	83.9	50	15 h 21m 16 s	0.061 s/image
Fully supervised learning	PSPNet	0.096	80.4	82.8	80.1	77.6	100	11 h 04 m 46 s	0.028 s/image
	U-Net	0.077	82.0	83.8	81.9	80.1	100	17 h 55 m 59 s	0.074 s/image
	HRNet	0.089	81.0	83.3	80.8	78.6	100	19 h 57 m 58 s	0.095 s/image
	DeepLabV3+	0.079	81.8	85.2	81.8	78.7	100	11 h 53 m 34 s	0.033 s/image
	WRDeepLabV3+	0.053	86.2	88.6	86.7	84.9	100	13 h 22 m 33 s	0.035 s/image
HSL (proposed)		0.084	82.8	85.2	82.9	80.6	100

Table 8. Comparison of segmentation effects of HSL and the end-to-end semantic segmentation network under different water leakage types.

	Blocks	Vertical Strips	Horizontal Strips	Stains	Occlusions
The original images
EM
CRF-RNN
1Stage
AA&LR
HSL

Table 9. MIoU and the efficiency comparison between the HSL and the end-to-end semantic segmentation network on the proposed dataset.

Approaches	Backbone	Val (%)	Test (%)	Testing time
EM	VGG16	58.4	59.8	0.092 s/image
CRF-RNN	VGG16	61.6	62.9	0.068 s/image
1Stage	WideResNet38	73.4	74.1	0.044 s/image
AA&LR	WideResNet38	75.6	76.4	0.052 s/image
HSL	WRes2Net101	81.7	82.8	0.035 s/image

Table 10. Performance evaluation of WL-CAM with different backbone networks based on the proposed dataset.

Methods	Precision (%)	Recall (%)	MIoU (%)	F1 (%)
WL-CAM (Res2Net101)	71.4	74.3	67.3	72.8
WL-CAM (WRes2Net101)	83.9	75.6	73.6	79.5

Table 11. Comparison of different WRDeepLabV3+ under the proposed dataset.

WRDeepLabV3+	Loss	Recall (%)	Precision (%)	IoU (%)
No attention mechanism	0.057	86.2	81.9	75.4
Add attention mechanism	0.053	88.6	84.9	77.6

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qiu, D.; Liang, H.; Wang, Z.; Tong, Y.; Wan, S. Hybrid-Supervised-Learning-Based Automatic Image Segmentation for Water Leakage in Subway Tunnels. Appl. Sci. 2022, 12, 11799. https://doi.org/10.3390/app122211799

AMA Style

Qiu D, Liang H, Wang Z, Tong Y, Wan S. Hybrid-Supervised-Learning-Based Automatic Image Segmentation for Water Leakage in Subway Tunnels. Applied Sciences. 2022; 12(22):11799. https://doi.org/10.3390/app122211799

Chicago/Turabian Style

Qiu, Dongwei, Haorong Liang, Zhilin Wang, Yuci Tong, and Shanshan Wan. 2022. "Hybrid-Supervised-Learning-Based Automatic Image Segmentation for Water Leakage in Subway Tunnels" Applied Sciences 12, no. 22: 11799. https://doi.org/10.3390/app122211799

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hybrid-Supervised-Learning-Based Automatic Image Segmentation for Water Leakage in Subway Tunnels

Abstract

1. Introduction

2. Approach

2.1. Basic Idea

2.2. Framework

2.3. Automatic Generation of Water Leakage Labels for Subway Tunnels

2.4. Semantic Segmentation of Water Leakage in Subway Tunnels

3. Experiment

3.1. Dataset

3.2. Experimental Scheme

3.3. Experimental Results

4. Analysis and Discussion

4.1. Performance Evaluation of Water Leakage—CAM

4.2. Performance Evaluation of WRDeepLabV3+ Semantic Segmentation Network

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI