A Lightweight Traffic Lights Detection and Recognition Method for Mobile Platform

Wang, Xiaoyuan; Han, Junyan; Xiang, Hui; Wang, Bin; Wang, Gang; Shi, Huili; Chen, Longfei; Wang, Quanzheng

doi:10.3390/drones7050293

Open AccessArticle

A Lightweight Traffic Lights Detection and Recognition Method for Mobile Platform

¹

College of Electromechanical Engineering, Qingdao University of Science & Technology, Qingdao 266100, China

²

Collaborative Innovation Center for Intelligent Green Manufacturing Technology and Equipment of Shandong Province, Qingdao 266100, China

^*

Author to whom correspondence should be addressed.

Drones 2023, 7(5), 293; https://doi.org/10.3390/drones7050293

Submission received: 16 March 2023 / Revised: 23 April 2023 / Accepted: 25 April 2023 / Published: 27 April 2023

(This article belongs to the Section Innovative Urban Mobility)

Download

Browse Figures

Versions Notes

Abstract

:

Traffic lights detection and recognition (TLDR) is one of the necessary abilities of multi-type intelligent mobile platforms such as drones. Although previous TLDR methods have strong robustness in their recognition results, the feasibility of deployment of these methods is limited by their large model size and high requirements of computing power. In this paper, a novel lightweight TLDR method is proposed to improve its feasibility to be deployed on mobile platforms. The proposed method is a two-stage approach. In the detection stage, a novel lightweight YOLOv5s model is constructed to locate and extract the region of interest (ROI). In the recognition stage, the HSV color space is employed along with an extended twin support vector machines (TWSVMs) model to achieve the recognition of multi-type traffic lights including the arrow shapes. The dataset, collected in naturalistic driving experiments with an instrument vehicle, is utilized to train, verify, and evaluate the proposed method. The results suggest that compared with the previous YOLOv5s-based TLDR methods, the model size of the proposed lightweight TLDR method is reduced by 73.3%, and the computing power consumption of it is reduced by 79.21%. Meanwhile, the satisfied reasoning speed and recognition robustness are also achieved. The feasibility of the proposed method to be deployed on mobile platforms is verified with the Nvidia Jetson NANO platform.

Keywords:

mobile platform; connected and automated vehicles; visual sensing; traffic lights detection and recognition; lightweight model

1. Introduction

A traffic light is a kind of facility that transmits right-to-access authorization instructions to traffic participants. Traffic lights are very common in cities, and the safe and efficient travel of any mobile platform in cities depends on traffic lights. Therefore, TLDR is not only the core function of intelligent vehicles and advanced driver assistant systems (ADASs) but also the necessary ability of visually impaired assistance systems and unmanned service platforms (such as low-altitude four-axis drones) that may be widely used in cities in the future. For these mobile applications, the limitations of cost and power supply capacity cannot be ignored. Under these restrictions, the deployment of high-computing-power chips is usually not feasible. Therefore, it is necessary to construct a lightweight TLDR.

Previous TLDR methods can be regarded as two kinds: the methods based on the physical features of the traffic lights [1,2,3,4,5,6] and the methods based on machine learning approaches [7,8,9,10,11,12,13,14,15,16]. The methods based on the physical features of the traffic lights refer to the methods with detector(s) designs based on the unique and relatively fixed shape and color of traffic lights to realize the detection and recognition of traffic lights. Jeong et al. proposed a color-based TLDR method [1], which contains four modules: the detecting module, the boundary candidate determination module, the boundary detection module, and the recognition module. In this method, the TLDR was realized through the color threshold and saturation intensity in HSI color space. Raoul et al. also proposed a color-based TLDR method [2]. Different from the method of Jeong et al., the method of Raoul et al. is based on grayscale with spotlight detection. Masako et al. constructed a method to transform the image from RGB color space to normalized RGB color space and realized the detection of traffic lights by further combining with the Hough transform method [3]. Ying et al. utilized the designed TLDR method in the HSI and RGB color space by using the geometric shapes of rectangles and circles and the differentiation in the color of traffic lights and background [4]. Later, Ying et al. further introduced an intensity analysis and used the HSI color space alone to establish an improved TLDR method [5]. Based on the differences between the backplane and light-emitting area of traffic lights and the background, Chen et al. proposed a TLDR method based on the nearest-neighbor interpolation algorithm within the HSV color space [6]. In recent years, with the excellent performance of machine learning, especially deep learning methods in target detection, they are increasingly used in TLDR. John et al. constructed a convolution neural network (CNN)-based TLDR method [7]. It is worth mentioning that in the method of John et al., GPS data were introduced to help locate the traffic lights, and the saliency map was utilized to detect the traffic lights. Behrendt et al. proposed a TLDR method based on a multi-artificial neural network (ANN) algorithm and achieved the recognition of traffic lights with a rate of 10 frames per second [8]. In addition to methodological contributions, a public dataset containing 5000 pictures and 8334 frames of video sequence that can be used for TLDR research was provided by Behrendt et al. The deep neural networks (DNNs) algorithm along with the color-space-based method were utilized by Lee et al. to construct a TLDR method [9]. Further, Martin et al. proposed a TLDR method based on Faster R-CNN to achieve a higher recognition rate [10]. Based on the Bosch Small Traffic Lights Dataset (BSTLD) provided by [8], Kim et al. compared the performance of different combinations of six color spaces and three types of networks for TLDR [11]. The results show that the combination of RGB space and Faster R-CNN performs best. Julian and Klaus employed the Single-Shot MultiBox Detector (SSD) algorithm to propose a TLDR method [12]. The results suggest that the method of Julian and Klaus can achieve a recognition rate of 10 frames per second when deployed on a platform with an Nvidia Titan XP Graphics Processing Unit (GPU). Any and Ayesha introduced graph-embedding Grassmann discriminant analysis into Faster R-CNN to detect traffic lights in a faster way [13]. The results based on 4 datasets show that the improved method can achieve a recognition rate of 32 frames per second when deployed in a platform with an Nvidia GTX1060 GPU. Dijana et al. introduced the method in [17] to locate the ROI in the CNN-based TLDR to improve its performance [14]. Different from other researchers, Yeh et al. utilized a binocular camera to obtain the distance between the traffic lights and the vehicle to help locate the traffic lights and proposed an extended CNN-based TLDR method [15]. Irfan et al. deployed a Faster R-CNN Inception v2 model in a platform with an Nvidia GTX1080Ti GPU and used a dataset obtained in Turkey to improve the performance of the model under the condition of cloudy or night [16]. The information in [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] is listed in Table 1.

TLDR is a typical small target detection task. YOLO-series algorithms have been verified and can achieve outstanding performance in multiple target detection tasks [18,19,20,21,22], which may also achieve satisfactory performance in TLDR. In the last few years, the TLDR methods based on YOLO-series algorithms have begun to receive more attention. In 2022, Chen utilized the resizing and normalization approach to propose an improved YOLOv5s-based traffic light detection method [23]. The verification results suggest that in the task of traffic light detection, the method of Chen can achieve better performance compared with the basic YOLOv5s model. Rafael et al. constructed two TLDR methods based on the YOLOv3 and YOLOv3-tiny algorithms and compared the performance of these two algorithms in the task of TLDR based on the dataset of the RoboCup Portuguese Open Autonomous Driving Competition [24]. To be specific, these two algorithms can achieve 2 and 17 frames per second on a platform with an Nvidia GTX1050 GPU. Wang et al. utilized the shallow feature enhancement mechanism and the bounding box uncertainty prediction mechanism to propose an improved YOLOv4-based TLDR method to achieve higher precision [25]. The results show that the mean average precision of this method is improved by 2.86% compared with the basic YOLOv4, and the method can achieve a recognition rate of 29 frames per second when deployed in a platform with an Nvidia GTX3090 GPU. Zhao et al. noticed the defect of the large model size of the YOLOv4 algorithm and employed the ShuffleNetv2 to replace the CSPDarkNet53 to propose a lightweight YOLOv4-based TLDR method [26].

Considering that traffic lights often account for a very small proportion of the background image, researchers are increasingly using the two-stage approach to improve the efficiency of TLDR. The two-stage approach refers to the use of two methods in detection and recognition. In the first stage, a method is used to locate and extract the traffic light from the image; in the second stage, another method is used to identify the right-of-access information transmitted by the traffic light based on the extracted small area. The two-stage approach can give a good balance of efficiency and accuracy, and has been paid more and more attention. The methods used in the two stages of the previous two-stage TLDR methods are shown in Table 2.

With the application of intelligent transportation system (ITS) and intelligent and connected technologies, mobile platforms such as low-altitude four-axis drones are expected to play an important role in the future urban transportation system. The drone-based systems, such as the air-vehicle-road collaboration system, the visually impaired assistance system, and the air surveillance system, will greatly benefit our community. For these systems, the incorporated drones should be able to detect and recognize traffic lights, which are very common and vital in cities. However, for drones, there are strict restrictions on volume, weight, and power supply capacity, which leads to it being almost impossible to place a high-computing-power chip on drones. Therefore, it is vital to enhance the feasibility of TLDR methods to be deployed on these platforms, which mainly depends on the model size and required computation quantity of the method. Aiming at this, a novel lightweight TLDR method that can be deployed on drones and other mobile platforms is proposed in this paper, and the contributions of our method are as follows:

A lightweight YOLOv5s is constructed to greatly reduce the model size and the requirements for computational power; meanwhile, a variety of efforts are conducted in the Neck network to ensure the robustness of the method;
An extended TWSVM model is used in the recognition stage to achieve fast and accurate recognition of multiple types of traffic lights, including arrow-shaped ones;
The feasibility of the proposed TLDR method to be deployed in the mobile platform with low computing power and low cost is verified.

The remainder of this paper is organized as follows: In Section 2, the methods and materials used in the modeling process are described, and the TLDR method is proposed. In Section 3, the results of the verification and evaluation of the proposed TLDR method are given, and the results are analyzed and discussed in Section 4. In Section 5, this paper is concluded.

2. Methods and Materials

To ensure the feasibility of the proposed method to be deployed in mobile platforms with low computing power, the size of the model should be reduced to achieve a lightweight model. Moreover, TLDR method is closely related to safety, and the robustness of the model must be guaranteed while reducing the size of the model. To achieve these, the two-stage approach, which makes the proposed method insensitive to the position of traffic lights in the images, is utilized in this paper. In the first stage, YOLOv5s is employed as the basic model, and the Backbone and Neck of it are improved to build a lightweight YOLOv5s, which is designed to achieve the location and extraction of the ROI. In the second stage, based on the extracted ROI, the HSV color space and an extended TWSVM algorithm are utilized to achieve the recognition of multi-type traffic lights, including arrow-shaped ones.

2.1. Detection Method Based on a Lightweight YOLOv5s Model

The YOLO-series algorithms play an important role in visual sensing, especially the target detection task. The original intention of the YOLO-series algorithms is to construct a network with higher speed compared to the traditional CNN-based methods. The first version of YOLO-series algorithms achieved high speed [18], but there are still areas for improvement. Therefore, Joseph Redmon, the creator of the YOLO algorithm, successively proposed two updated versions of the YOLO-series algorithms, namely, the YOLOv2 [37] and YOLOv3 algorithms [38]. After that, Bochkovskiy et al. proposed a YOLOv4 algorithm with applications of multi-approaches such as an improvement in the Neck network to expand the receptive field and the introduction of the Mish activation function [39]. Later, based on the further optimization of YOLOv4, the YOLOv5 algorithm was proposed, which can achieve a higher speed on the typical COCO dataset compared with the YOLOv4 algorithm [40].

In the previous TLDR methods, the YOLOv4 algorithm has been used and analyzed. The results suggest that the YOLOv4-based TLDR methods still require high computation power. Compared with the YOLOv4 algorithm, the YOLOv5 algorithms, especially YOLOv5s, have smaller sizes and fewer parameter quantities, which make the YOLOv5 algorithms have great potential in building a lightweight TLDR method. The basic structure of the YOLOv5 algorithms includes four parts: the input, Backbone network, Neck network, and prediction end (for detailed information, refer to [40]). Although the parameter quantity of the YOLOv5s algorithm is relatively low among the YOLO-series algorithms, it has been proven that the parameter quantity is not low enough to make the original YOLOv5s algorithm deployable in platforms with low computation power. Nevertheless, the improvement approaches employed by Bochkovskiy et al. reveal that for a specific task, it is possible to achieve a significant reduction in the parameter quantities of the YOLO-based method while ensuring the robustness of the method. On the one hand, there is little research discussing the performance characteristics of the basic YOLOv5s model when used in a TLDR task. On the other hand, there is no study on targeted improvements in YOLOv5s for a TLDR task to enable the proposed method to be deployed in mobile platforms, of which there are limitations in power supply, volume, heat dissipation, cost, and other aspects.

Motivated by this, the YOLOv5s, of which the model size is smaller, is employed as the basic model in this paper to propose a lightweight TLDR method that is suitable for mobile platforms. To achieve this goal, the Backbone and Neck network of the YOLOv5s will be improved in the following sections, and the objectives of improvement are as follows:

Further reduce the model size to improve the feasibility of deployment;
Maintain or improve the robustness of results at the same time.

It is well known that a large reduction in the number of network parameters usually leads to the deterioration in network performance. Considering this, we first introduce the Shuffle Attention (SA) mechanism into the MobileNetV3-Large to construct an improved MobileNetV3-Large network, which is marked as MobileNetV3-SA, and replace the original Backbone of YOLOv5s with MobileNetV3-SA to realize a significant reduction in network parameters. Secondly, we improve the Neck of YOLOv5s to improve the robustness of the results. Specifically, a new 160 × 160 feature branch is added in the Neck, and the original structure of FPN + PAN is updated to a new structure of Mul-FPN + PAN. In the new feature branch and structure of Mul-FPN + PAN, several improved down-sampling modules are added into them. With the new feature branch and structure of Mul-FPN + PAN along with the improved down-sampling modules, an extra feature fusion can be achieved at the 80*80 node on the Mul-FPN side, and thus, the feature extraction ability for traffic lights of the proposed model would be improved. Finally, we also replaced the conventional CONV module with improved ones, marked as CBH module, to improve the reasoning speed of the network.

In the following two Section 2.1.1 and Section 2.1.2, we will introduce each improvement separately and in detail.

2.1.1. Improvements in the Backbone

It is very hard to deploy the original YOLOv5s model on a mobile platform with low computing power. The reduction in computing power demand depends on the light weight of the model. Therefore, the Backbone of the original YOLOv5s model is replaced with an improved MobileNetV3-Large. MobileNetV3 serial models are novel and effective lightweight neural networks. In MobileNetV3-Large, the Squeeze and Excitation Networks (SENet) is employed. Therefore, the original MobileNetV3-Large pays more attention to the feature dependency of different channels in the feature layer than the feature dependency in the space, which is not good for improving the performance of the TLDR method. To solve this, we introduce the SA mechanism, of which the structure is shown in Figure 1. The SA mechanism is employed to replace the SENet. With this improvement, the amount of computation is further reduced by grouping channels on the feature layer. The network structure of the proposed MobileNetV3-Large, the MobileNetV3-Large-SA, is shown in Table 3.

2.1.2. Improvements in the Neck

(1): Feature fusion network

The structure of original FPN + PAN is shown in Figure 2. In this structure, the feature layer is obtained by fusing deep features with strong semantic information and shallow features with high resolution. At this point, small targets such as traffic lights are difficult to capture. To improve the performance of the proposed method in capturing traffic lights, we constructed a new Mul-FPN network, of which the structure is shown in Figure 2a. As a comparison, the original FPN + PAN structure is shown in Figure 2b.

From Figure 2, one can obtain that compared with the original FPN + PAN network, the new Mul-FPN network has two advantages. On the one hand, the deep 20 × 20 feature and shallow 160 × 160 feature are introduced on the FPN side. With two subpixel convolutions of the deep feature and two-time down-sampling of the shallow feature, the channels are combined in the 80 × 80 feature layer. Based on this, the deep 20 × 20 feature is cascaded to the 80 × 80 shallow feature layer with an Add module. The shallow area texture details and deep semantic contents are expanded, and thus, the learning ability of large-scale feature layers for small targets is improved. On the other hand, a cross-stage connection module (i.e., an Add module) is introduced in the feature fusion network, of which the structure is shown in Figure 3. In the Add module, subpixel convolution is conducted to achieve 4 times upper sampling which makes the cascade realized. With this module, the semantic information of shallow features and, thus, the network’s ability to capture small targets are improved.

(2): Mul-BottleneckCSP module

For the TLDR method, the context information can obviously be used to improve the performance of the model to detect the traffic lights and locate the ROI. Considering this, we add a new feature extraction branch in the original BottleneckCSP module of YOLOv5s. In this branch, two 3 × 3 convolutions are utilized. Based on this, the receptive field of the branch reaches 5 × 5. To fuse features with diverse scales, the SA mechanism is also employed. With these efforts, an improved BottleneckCSP module, marked as Mul-BottleneckCSP, is proposed, of which the structure is shown in Figure 4.

Based on the Mul-BottleneckCSP module, the context information with diverse scales and the difference in salient and non-salient areas with diverse scales can be utilized to achieve higher performance of the network in the location and extraction of ROI.

(3): Improved down-sampling module

In the original sampling module, a single 3 × 3 convolution is used. This approach makes features of small targets, such as traffic lights, easy to lose, which will deteriorate the model’s performance. Combination features of maximum pooling and convolution are utilized to enhance the feature expression ability of the down-sampling module, and the SA mechanism is employed to balance the contribution rate of the two types of features. The structure of the improved down-sampling module is shown in Figure 5. With these efforts, the feature expression ability of the proposed model is improved.

(4): Improved CBH block

The original convolution block in the feature fusion network module consists of CONV, Batch Normalization, and Leaky ReLU activation function. ReLU is one of the most widely used activation functions, but there is some room for improvement in ReLU. With the means of experimental evaluation, Ramachandran et al. analyzed the performance of multi-activation functions on various typical datasets compared with the ReLU [41]. The results reveal that the Swish activation function can achieve a 0.9% higher classification accuracy compared with ReLU. Nevertheless, the Swish activation function requires a relatively high computing quantity. To solve this, Howard et al. further proposed the Hardswish activation function [42]. This activation function is composed of common operators and can achieve a significant reduction in computational complexity with little negative impact on performance, which could increase the reasoning speed (more information on these activation functions can be found in [42,43]).

To speed up the reasoning ability of the proposed method, the Leaky ReLU activation function is replaced by the Hardswish activation function. Based on this, the improved CBH convolution block is shown in Figure 6.

With the improvements described in Section 2.1.1 and Section 2.1.2, a novel lightweight YOLOv5s model for the location and extraction of ROI in the TLDR method was built, and the structure of the built model is shown in Figure 7. It is worth noting that there was no significant improvement made to the Head network or the loss function in this paper. For specific information on the Head network, the classification mechanism, and the loss function of the proposed lightweight YOLOv5s model, please refer to [40].

Based on the proposed method shown in Figure 7, ROI, where the traffic light locates, was extracted as the input of the second stage, which is the recognition method.

2.2. Recognition Method Based on HSV Color Space and Extended TWSVM Algorithm

The recognition of traffic lights is to understand the right-of-access authorization information transmitted by traffic lights. Traffic light recognition method can be divided into two aspects: the recognition of the traffic light’s color and the recognition of the traffic light’s shape. In this paper, the HSV color space is employed to achieve the recognition of the traffic light’s color. At the same time, in the process of recognition of the traffic light’s color, the binarization mask is calculated, which will be used as input of the extended TWSVM algorithm to achieve the recognition of the traffic light’s shape.

2.2.1. Recognition Method of Traffic Light’s Color

The ROI extracted from the background image is with the RGB color space. Compared with the RGB color space, there are three independent channels—, the Hue (H), Saturation (S), and Value (V)—in the HSV color space. Based on these channels, it will be easier to conduct accurate image segmentation. On the other hand, the independent Value channel makes the HSV-based method insensitive to luminance, which is more suitable for TLDR. It is worth noting that there is another color space, the HSI space, with an independent Intensity channel to represent brightness, like the role of Value in the HSV space. This channel makes the HSI color space very similar to the HSV color space. In fact, these two kinds of color space can both meet the needs of this research. For the continuity of study, the HSV color space is employed in this paper.

According to Equations (1)–(3), the ROI can be transformed from the RGB color space to the HSV color space:

A_{H} = \{\begin{array}{l} 0^{°} & i f X = N \\ 60^{°} \times \frac{G - B}{X - N} + 0^{°} & i f X = Rand G \geq B \\ 60^{°} \times \frac{G - B}{X - N} + 360^{°} & i f X = Rand G < B \\ 60^{°} \times \frac{B - R}{X - N} + 120^{°} & i f X = G \\ 60^{°} \times \frac{R - G}{X - N} + 240^{°} & i f X = B \end{array}

(1)

A_{S} = \{\begin{array}{l} 0 & i f X = 0 \\ 1 - \frac{N}{X} & o t h e r w i s e \end{array}

(2)

A_{V} = X

(3)

where

X = \max (R, G, B)

;

N = \min (R, G, B)

;

R

,

G

, and

B

are the normalized ones; and

R, G, B \in [0, 1]

.

Based on the three components of ROI in HSV space

A_{H}

,

A_{S}

, and

A_{V}

, one can obtain:

A_{r e d} \Leftarrow \{\begin{cases} A_{H} \in [t h_{1}, t h_{2}] \cup [t h_{3}, t h_{4}] \cup [t h_{5}, t h_{6}] \\ A_{S} \in [t h_{11}, 255] \\ A_{V} \in [t h_{12}, 255] \end{cases}

(4)

A_{g r e e n} \Leftarrow \{\begin{cases} A_{H} \in [t h_{9}, t h_{10}] \\ A_{S} \in [t h_{11}, 255] \\ A_{V} \in [t h_{12}, 255] \end{cases}

(5)

A_{y e l l o w} \Leftarrow \{\begin{cases} A_{H} \in [t h_{5}, t h_{6}] \cup [t h_{7}, t h_{8}] \\ A_{S} \in [t h_{11}, 255] \\ A_{V} \in [t h_{12}, 255] \end{cases}

(6)

According to the definition of HSV color space, the H, S, and V thresholds of basic colors are shown in Table 4.

Based on the data shown in Table 4, the values of

t h_{1}

~

t h_{10}

can be calculated as 0, 25, 125, 180, 11, 25, 26, 34, 35, and 124, respectively. Additionally, the value of

t h_{11}

and

t h_{12}

can be obtained from

t h_{11} = \{\begin{cases} 60, S_{m e a n} \geq 100 \\ 0.8 \times S_{m e a n}, S_{m e a n} < 100 \end{cases}

(7)

t h_{12} = \{\begin{cases} V_{m e a n} + 30, S_{m e a n} \geq 48 \\ V_{m e a n}, 35 \leq S_{m e a n} < 48 \\ V_{m e a n} + 10, 23 \leq S_{m e a n} < 35 \\ V_{m e a n} + 20, S_{m e a n} < 23 \end{cases}

(8)

where

S_{m e a n}

and

V_{m e a n}

are the mean value of

A_{S}

and

A_{V}

, respectively.

With the above efforts, the non-lit area of the traffic light’s back plate with low color saturation and light intensity can be filtered out. With the obtained lit area of the traffic light, the binarization mask

M_{r e d}

,

M_{y e l l o w}

, and

M_{g r e e n}

can be calculated based on

A_{r e d}

,

A_{y e l l o w}

, and

A_{g r e e n}

. According to

M_{r e d}

,

M_{y e l l o w}

, and

M_{g r e e n}

, the color of traffic lights can be recognized. Meanwhile, the Histogram of Oriented Gradients (HOG) of the binarization mask can be calculated and utilized as input for the recognition method of the traffic light’s shape.

2.2.2. Recognition Method of Traffic Light’s Shape

The SVM algorithm has been widely used in previous TLDR methods [27,28,30,31]. The results suggest that SVM can be employed to recognize traffic lights. In this paper, we try to propose a novel lightweight TLDR method to achieve high feasibility of deployment in mobile platforms with low computing power. Thus, we further utilize the TWSVM algorithm [43] and, based on it, construct an extended TWSVM algorithm to achieve the recognition of traffic lights. Compared with the basic SVM, the number of constraints for each quadratic programming problem in TWSVM becomes half of that in SVM, and therefore, the reasoning speed of TWSVM can be increased by four times in theory [44].

Unlike [7,8,9,12,17,25,29,30,32,33,34,35], traffic lights with arrow shapes are also considered in this paper. The recognition of multi-type traffic lights, including ones with arrow shapes, is a multi-classification problem. However, TVSVM is a binary classification algorithm. Therefore, we cascade multi-TVSVM to achieve the recognition of multi-type traffic lights.

In the model, each basic TVSVM can be described as follows.

Unlike the single hyperplane of the basic SVM, there are two hyperplanes in a TWSVM, which are:

x^{T} w_{1} + b_{1} = 0

(9)

x^{T} w_{2} + b_{2} = 0

(10)

Accordingly, the recognition method based on TWSVM can be described as:

\begin{array}{l} ({TWSVM}_{1}) \min \frac{1}{2} {‖A w^{(1)} + e_{1} b^{(1)}‖}^{2} + c_{1} e_{2}^{T} ξ^{(2)} \\ s . t . - (B w^{(1)} + e_{2} b^{(1)}) \geq e - ξ^{(2)}, ξ^{(2)} \geq 0 \end{array}

(11)

\begin{array}{l} ({TWSVM}_{2}) \min \frac{1}{2} {‖B w^{(2)} + e_{2} b^{(2)}‖}^{2} + c_{2} e_{1}^{T} ξ^{(1)} \\ s . t . - (A w^{(2)} + e_{1} b^{(2)}) \geq e - ξ^{(1)}, ξ^{(1)} \geq 0 \end{array}

(12)

According to the Karush–Kuhn–Tucker (KKT) condition, by introducing the LaGrange multiplier,

α

and

β

, one can obtain the dual problems of optimization Problems (9) and (10):

\begin{array}{l} ({TWSVM}_{1}) \max e_{2}^{T} α - \frac{1}{2} α^{T} G {(H^{T} H)}^{- 1} H^{T} α \\ s . t . 0 \leq α \leq c_{1} \end{array}

(13)

\begin{array}{l} ({TWSVM}_{2}) \max e_{1}^{T} γ - \frac{1}{2} γ^{T} P {(Q^{T} Q)}^{- 1} P^{T} γ \\ s . t . 0 \leq γ \leq c_{2} \end{array}

(14)

When

u = {[w^{(1)} b^{(1)}]}^{T}

and

v = {[w^{(2)} b^{(2)}]}^{T}

can be determined, then the hyperplanes in the TWSVM can be settled, which are:

\begin{array}{l} Label (x) = \arg \min |x^{T} w^{(i)} + b^{(i)}| = \{\begin{matrix} d_{1} \Rightarrow C l a s s 1 \\ d_{2} \Rightarrow C l a s s 2 \end{matrix} & i = 1, 2 \end{array}

(15)

With Equation (15), the recognition problem can be solved.

By inputting the calculation results of HOG into the constructed TVSVM-based model, the shape of traffic lights can be recognized. Combining the recognition of the color and shape of traffic lights, the recognition result of the right-to-access authorization instructions transmitted by the traffic lights, such as “stop”, can be obtained.

In summary, the overall workflow of the TLDR method proposed in this paper is shown in Figure 8.

2.3. Platforms and Data

2.3.1. Data

For YOLO-based methods, the quality of the dataset is crucial for the performance of the method. Specifically, in this paper, multiple types of traffic lights, including the ones with arrow shapes, are considered due to the fact that most datasets used in previous achievements of TLDR methods do not contain traffic lights with arrow shapes or the proportion of these traffic lights is too small. Therefore, the approach of a self-built dataset is taken in this paper instead of using public datasets.

To collect the data, naturalistic driving experiments with an instrument vehicle were conducted. Specifically, the dataset used in this paper was collected on a public road in a certain urban area of China under good weather conditions (without rain or snow) during the day using a forward-facing RGB camera carried on the integrated experimental vehicle, which is shown in Figure 9.

There are 7648 RGB images with a resolution of 1280*720 in the dataset. In a random way, 80% of the dataset is extracted as the train set, 10% is extracted as the test set, and 10% is extracted as the validation set. Several example images are shown in Figure 10.

It is worth noting that, in this study, we used a manned instrument vehicle instead of drones to collect the dataset. First of all, it is almost impossible to collect the dataset by directly using drones. The main reasons are (1) in the region where the authors live (and many other countries and regions around the world), using drones in public areas requires complex approvals; (2) the use of drones is greatly affected by the weather; and (3) the battery life of current drones is too short. These factors make it almost impossible to carry drones along with their battery and other accessories when moving between intersections to collect qualified datasets.

Moreover, using a manned instrument vehicle to collect the dataset for this research is completely acceptable. A detailed explanation is as follows.

On the one hand, for vehicles and drones, there is indeed a difference. Specifically, vehicles are driving on the ground, so the traffic lights are in the upper part of their field of view. In contrast, drones operate in the air, so traffic lights are in the lower part of their field of view. However, this difference has no impact on our method. This is due to the two-stage approach taken in this study. In the first detection stage, traffic lights are located and extracted from the background images. In the second recognition stage, our method only needs to work within the extracted area rather than the entire image. Therefore, regardless of whether the traffic light is in the upper or lower part of the image, there will be no differences after processing through the detection stage method. In other words, the impact of the aforementioned differences was completely eliminated.

On the other hand, for manned or unmanned vehicles, the position of traffic lights within their perspective is the same. Therefore, both the manned and unmanned vehicle used in the research has no impact on the collected dataset and the proposed methods based on it.

Therefore, a manned instrument vehicle was employed to collect the dataset in this paper.

2.3.2. Platform

In this paper, a novel lightweight TLDR method is proposed for mobile platforms. Therefore, to verify the feasibility of the proposed method to be deployed in low-cost mobile platforms with low computing power, two kinds of platforms are utilized. The first one is a kind of widely used platform with an Nvidia high-performance GPU to train, verify, and evaluate the proposed method by following general standards. Detailed information on this platform is shown in Table 5.

In addition to the platforms shown in Table 5, another platform was also employed in this paper to verify the feasibility of the established TLDR method to be deployed on drones and other mobile platforms. As mentioned above, for drones, there are strict limitations on the size, weight, power supply capacity, and cost, which makes it impossible to deploy a high-computing-power chip (usually equivalent to high power consumption and large system volume and weight) on them. The intention of this study is to propose a lightweight TLDR method suitable for mobile platforms represented by drones. To verify the feasibility of the proposed TLDR method to be deployed, Nvidia Jetson Nano was employed as another platform. As a typical edge computing platform with low computing power, small volume, and low electricity demand, Nvidia Jetson Nano can be placed on platforms, such as drones represented by DJI M30. Detailed information on Nvidia Jetson Nano and DJI M30 can be obtained from the website of their manufacturers and will not be repeated here.

3. Results

3.1. Evaluation Index

To test and evaluate the performance of the proposed method, the indices, precision (

P

), recall (

R

), mean average precision (

m A P

),

F 1 - s c o r e

, model size, and quantity of computation are employed in this paper. The value of

P

,

R

,

m A P

, and

F 1 - s c o r e

can be obtained from:

P = \frac{T P}{T P + F P}

(16)

R = \frac{T P}{T P + F N}

(17)

A P = \int_{0}^{1} P d R

(18)

F 1 - s c o r e = 2 \times \frac{P \times R}{P + R}

(19)

Additionally,

m A P

is the mean value of all classifications’

A P

.

3.2. Evaluation of Detection Method of Traffic Lights

The proposed TLDR method is a two-stage approach. In the first stage, a novel lightweight YOLOv5s model is established to achieve the detection of traffic lights. To evaluate the performance of the constructed detection method, the widely recognized ablation experiments are utilized in this paper. Among the ablation experiments, the No.1 experiment is to directly use the original YOLOv5s model to detect traffic lights. The No.2 experiment is to use an improved YOLOv5s model, which is constructed by introducing MobileNetV3-Large into the original YOLOv5s model, to detect traffic lights. The No.3 experiment is to use an improved YOLOv5s model, which is constructed by introducing the MobileNetV3-Large-SA into the original YOLOv5s model, to detect traffic lights. The No.4 experiment is to use an improved YOLOv5s model, which is constructed by introducing the new Mul-FPN network built in Section 2.1.2 into the original YOLOv5s model used in the No.3 experiment, to detect traffic lights. The No.4 experiment is to use an improved YOLOv5s model, which is constructed by introducing the new feature fusion network built in Section 2.1.2 into the YOLOv5s model used in the No.3 experiment, to detect traffic lights. The No.5 experiment is to use an improved YOLOv5s model, which is constructed by introducing the improved down-sampling module built in Section 2.1.2 into the YOLOv5s model used in the No.4 experiment, to detect traffic lights. No.6 experiment is to use an improved YOLOv5s model, which is constructed by introducing the Mul-BottleneckCSP module built in Section 2.1.2 into the YOLOv5s model used in the No.4 experiment, to detect traffic lights. The No.7 experiment is to use an improved YOLOv5s model, which is constructed by introducing the improved down-sampling module built in Section 2.1.2 into the YOLOv5s model used in the No.6 experiment, to detect traffic lights. In other words, the model used in the No.7 experiment is the novel lightweight YOLOv5s model proposed in this paper to achieve the detection of traffic lights.

The results of the ablation experiments are shown in Table 6.

With the results in Table 6, one can obtain that the lightweight TLDR method proposed in this paper achieves satisfactory performance, and the research objectives were accomplished. Specifically, the

m A P

of our method reached 96.4%, the precision reached 91.07%, and the recall reached 95.27%. Compared with the original YOLOv5s, the maximum performance improvement in our method under the above indices can reach 4.77%. More importantly, the model size of our method is merely 1.92 MB, and the required quantity of computation is 3.43 GFLOPs. Compared with the original YOLOv5s model, our method achieves an increase of 73.3% and 79.2% under the above two indices.

It is worth noting that the objectives of the improvement we made in Section 2.1.1 are to reduce the size of the model and the requirement of computational power to improve the feasibility of our model to be deployed. However, as we mentioned above, this effort will inevitably lead to a deterioration in model performance. Therefore, in Section 2.1.2, we made several improvements to the Neck part to improve the performance of the model and ensure the robustness of the results. The ablation experiments, which consist of seven groups of experiments, are employed to test the influence of various improvements proposed in Section 2.1. In these groups of experiments, the No. 5 experiment is a network built through the introduction of the down-sampling module shown in Figure 5 into the network used in the No. 4 experiment. This group of experiments is to test the influence of the introduction of the down-sampling module on network performance. Additionally, the No. 6 experiment is a network built through the introduction of the MultiBottleneckCSP module, as shown in Figure 4, into the network used in the No. 4 experiment. Compared with the No.5 experiment, the No. 6 experiment is to test the influence of the MultiBottleneckCSP module on network performance. Following the principles of the ablation experiments, there are no other changes except for the aforementioned two separate improvements.

With the results of the ablation experiments, these reduction purposes were accomplished. The improvement measures in Section 2.1.1 have successfully greatly reduced the size of the model and the quantity of computation, reaching 78.06% and 85.09%. At this point, the performance of the model decreases under all three indices. To ensure the robustness of the results, we introduced a variety of improvement measures in the Neck part. The increase in the three performance indices, model size, and quantity of computation caused by various improvement measures can be obtained in Table 5. Among them, with the results of the No. 5 and No. 6 experiments, one can obtain that with the further introduction of the Mul-BottleneckCSP module, the model achieved a 2% increase under the

P

indicator, a 0.2% decrease under the

R

indicator, a 3.4% increase in model size, and a 7.18% decrease in computation quantity. Combining these improvement measures, we finally achieved a satisfactory performance, and compared with the No. 2 experiment, the cost in model size and required calculation is small. The increased model size and quantity of computation can still meet the requirements of the deployment feasibility, which will be verified in Section 3.4.

3.3. Evaluation of Recognition Method of Traffic Lights

With the detection method that was verified in Section 3.2, 12,337 ROI were extracted from the dataset. Among them, there are 5100 red lights, 6790 green lights, and 447 yellow lights. Based on these data, the evaluation of the recognition method of traffic light’s color can be carried out, and the results are shown in Table 7.

Based on the extracted ROI, in order to achieve the balance of the various types of traffic lights, so as to facilitate the calibration of parameters in TWSVM, the number of samples of the four types of traffic light shapes, namely, round, left arrow, up arrow, and right arrow, is processed to be all of 2250 through manual screening, rotation, supplement, deletion, and other measures. A total of 9000 samples are constructed, and 70% of them are selected as the training set of TWSVM according to random rules, and the other 30% are used as the test set. With these two datasets, the proposed TWSVM-based recognition method of traffic light’s shape is trained and verified, and the results are shown in Table 8.

3.4. Comprehensive Evaluation

The original YOLOv5s model can be directly employed to achieve TLDR without other models. However, to give full play to each kind of model, the two-stage approach was utilized in this paper. The proposed lightweight YOLOv5s model is merely utilized in the first stage to achieve the detection of traffic lights. To evaluate the proposed method, we detected and recognized traffic lights based on the original YOLOv5s model and our method, which is a combination of a new lightweight YOLOv5s model and an extended TVSVM-based model. The comparison results are shown in Table 9.

With the results in Table 9, one can obtain that when the original YOLOv5s model is directly employed to detect and recognize traffic lights, there will be several defects that are low detection rate, poor positioning accuracy, and inaccurate recognition results caused by the small size of the target, which is represented in all three indices. In contrast, our method can achieve 92.35%

m A P

P, which is increased by 45.4%. Additionally, precision and recall can reach 87.72% and 91.45%, respectively. With these indices, the robustness of our method can be further verified.

To further illustrate the performance of our method, the typical TLDR methods, which are based on the two-stage approach, were employed to compare with the method proposed in this paper, of which the comparison results are shown in Table 10.

On the other hand, to verify the feasibility of our method to be deployed on mobile platforms with low computing power, low cost, and low power demand, we deployed our method in a desktop platform with an Nvidia GPU, which was widely used in previous studies, and in Nvidia Jetson Nano that is a typical low computing power platform. The performance of our method in these two platforms is compared with that of previous achievements, and the results are shown in Table 11.

With the results in Table 11, one can obtain that our method exhibits high feasibility of deployment in either a conventional desktop GPU platform or mobile platforms with low computing power.

4. Discussion

TLDR is a necessary function of multiple mobile platforms. Although TLDR based on video frames has been realized in previous achievements [23,24,25,26,32,36], there are still some shortcomings in these methods. In theory, due to the relatively small scale of traffic lights in the video frame, the previous methods are prone to poor detection accuracy and high detection miss rates. During application, the previous methods have a large demand for computing power, which is difficult to match the characteristics of mobile platforms, such as the low capacity of power supply, low computing power, and small volume. Considering these, a novel lightweight TLDR method based on the two-stage approach was proposed in this paper to achieve a substantial reduction in model size and required computation quantity on the premise of ensuring the robustness of the results. In the detection stage of our method, a new lightweight YOLOv5s model was constructed to locate and extract ROI. In the recognition stage of our method, a combination model consisting of HSV color space and an extended TVSVM algorithm was established to identify the right-to-access information transmitted by traffic lights. The two-stage approach allows the advantages of different algorithms in detection and recognition. Compared with the previous two-stage TLDR methods [26,27,28,29,30,31,32,33,34,35], our contribution mainly lies in:

Detection. The model size was considerably reduced by replacing the original Backbone network of the YOLOv5s model with a newly constructed MobileNetV3-Large-SA network, and the performance was certainly improved by introducing a newly established feature fusion network based on Mul-FPN + PAN and an extended Add module, an improved down-sampling module, and an improved CBH convolution block and Mul-BottleneckCSP module into the Neck network.
Recognition. A newly built extended TVSVM algorithm instead of the widely used SVM was combined with the HSV color space to achieve fast and accurate recognition of traffic lights, and more types of traffic lights, including ones with arrow shapes, were considered in the recognition method.

With the evaluation results in Section 3, one can obtain that the method proposed in this paper has performance advantages compared with previous achievements. Specifically, in the aspects of the robustness of the recognition results, the proposed method achieved high performance of ROI detection, in which

m A P

,

P

, and

R

reached 96.4%, 91.07%, and 95.27%, respectively, and also achieved high robustness of recognition of right-to-access information, in which the accuracy of recognition of traffic light’s color reached 98.6% and

P

of recognition of traffic light’s shape reached 98.2%. Based on these results, we believe our method is effective and robust. On the other hand, our method also achieved a substantial reduction in model size and required computation quantity. Compared with the original YOLOv5s-based method, the size of our method was reduced by 73.3%, and the required computation quantity of our method was reduced by 79.21%. More importantly, our method was successfully deployed in a typical mobile platform with low computing power, Nvidia Jetson Nano, with a running speed of 13 frames per second. In other words, the data refresh rate reached 60 ms each time, which can meet the needs of mobile applications. In one of the latest TLDR achievements [25], the YOLO-v4-based approach realized a reduction of 81.79% in model size and an improvement of 1.79% in robustness. Unfortunately, the feasibility of deployment on mobile platforms was not verified. Compared with the YOLOv4 model, the selected basic model, YOLOv5s, in this paper, has a smaller primary model size, which means that the prospect of deployment on mobile platforms is more promising. Based on the improvements described in Section 2, the research objectives were accomplished, and a good result was achieved, which is that model size decreased by 73.3%, while

m A P

increased by 1.1%. More importantly, the feasibility of our method to be deployed on mobile platforms was verified. Therefore, our method can be deployed on drones and other mobile platforms, of which the computing power is low. On the other hand, TLDR is one of the necessary functions for ground mobile platforms such as unmanned vehicles. The required computing power of our method was significantly reduced, which can help reduce the cost of autonomous driving vehicles. Moreover, our method can also reduce the power consumption of TLDR modules on autonomous driving vehicles, thereby improving their endurance and ecological performance.

In summary, the method proposed in this paper achieves the research objectives of (1) reducing the computational power demand and (2) improving the method performance. However, the research work in this paper can still be further improved in the following aspects:

Utilization of richer datasets. Limited to the objective conditions, the dataset used in this paper was collected under good weather conditions. In future research, we plan to organize more large-scale naturalistic driving experiments to collect data at night and under rainy, snow, dense fog, and other weather conditions, and on this basis, study image enhancement algorithms and their light weight under severe weather conditions.
Further acceleration of the algorithm. Although the proposed method achieved good running speed on the low computing power platform and the refresh rate can meet the application requirements of OBU, the consumption of resources of the method may be further compressed. We plan to achieve a higher running frame rate on the embedded platform in future research.

Nevertheless, it is well known that the datasets used are one of the key determinants of the performance of the proposed deep learning model. Therefore, it is necessary for future research to conduct a unified comparative analysis of the performance of different TLDR methods on various datasets, especially when detecting and identifying diverse types of traffic lights in multiple kinds of environments.

5. Conclusions

TLDR is one of the core functions of urban low-altitude four-axis drones and other mobile platforms such as intelligent vehicles and assistance systems for the visually impaired. Improving the penetration of these mobile platforms can certainly benefit the community. For mobile platforms represented by drones, there are strict limitations on size, volume, weight, and power supply capacity that make it impossible to deploy high-computing-power chips. Therefore, although the robustness of previous TLDR methods is good, the feasibility of these methods to be deployed in mobile platforms is limited by their model size and required computation quantity. To solve this, a novel lightweight TLDR method was proposed in this paper. In our method, the two-stage approach was employed. In the detection stage, a new lightweight YOLOv5s model was established and utilized to locate and extract the ROI. In the recognition stage, a combination model consisting of HSV color space and an extended TWSVM algorithm was constructed and used to identify the right-to-access information transmitted by traffic lights. The verification and evaluation results suggest that the model size and required computation quantity of our method were substantially reduced while the robustness was improved. Moreover, the feasibility of our method to be deployed in mobile platforms was verified based on Nvidia Jetson Nano, a typical platform with low computing power and low power demand. The results of this paper can provide theory and application references for the research on the TLDR method and the development of multi-type mobile platforms, such as drones and intelligent vehicles.

Author Contributions

Conceptualization, X.W. and J.H.; methodology, J.H. and H.X.; software, H.X.; validation, B.W.; formal analysis, G.W.; investigation, J.H. and H.S.; resources, X.W. and L.C.; data curation, B.W., H.S., L.C. and Q.W.; writing—original draft preparation, J.H.; writing—review and editing, X.W. and J.H.; visualization, B.W. and G.W.; supervision, X.W.; project administration, X.W.; funding acquisition, X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by project ZR2020MF082, supported by the Shandong Provincial Natural Science Foundation; project IGSD-2020-012, supported by the Collaborative Innovation Center for Intelligent Green Manufacturing Technology and Equipment of Shandong Province; project 19-3-2-11-zhc, supported by the Qingdao Top Talent Program of Entrepreneurship and Innovation; and project 2018YFB1601500, supported by the National Key Research and Development Program.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Jun-Ik, J.; Do-Whan, R. Real Time Detection and Recognition of Traffic Lights Using Component Subtraction and Detection Masks. J. Inst. Electron. Eng. Korea 2006, 43, 65–72. [Google Scholar]
de Charette, R.; Nashashibi, F. Traffic Light Recognition Using Image Processing Compared to Learning Processes. In Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, St Louis, MO, USA, 11–15 October 2009; pp. 333–338. [Google Scholar]
Omachi, M.; Omachi, S. Traffic Light Detection with Color and Edge Information. In Proceedings of the 2009 2nd IEEE International Conference on Computer Science and Information Technology, Beijing, China, 8–11 August 2009; pp. 284–287. [Google Scholar]
Jie, Y.; Xiaomin, C.; Pengfei, G.; Zhonglong, X. A New Traffic Light Detection and Recognition Algorithm for Electronic Travel Aid. In Proceedings of the 2013 Fourth International Conference on Intelligent Control and Information Processing (ICICIP), Beijing, China, 9–11 June 2013; pp. 644–648. [Google Scholar]
Ying, J.; Tian, J.; Lei, L. Traffic Light Detection Based on Similar Shapes Searching for Visually Impaired Person. In Proceedings of the 2015 Sixth International Conference on Intelligent Control and Information Processing (ICICIP), Wuhan, China, 26–28 November 2015; pp. 376–380. [Google Scholar]
Chen, X.; Chen, Y.; Zhang, G. A Computer Vision Algorithm for Locating and Recognizing Traffic Signal Control Light Status and Countdown Time. J. Intell. Transp. Syst. 2021, 25, 533–546. [Google Scholar] [CrossRef]
John, V.; Yoneda, K.; Qi, B.; Liu, Z.; Mita, S. Traffic Light Recognition in Varying Illumination Using Deep Learning and Saliency Map. In Proceedings of the 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), Qingdao, China, 8–11 October 2014; pp. 2286–2291. [Google Scholar]
Behrendt, K.; Novak, L.; Botros, R. A Deep Learning Approach to Traffic Lights: Detection, Tracking, and Classification. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 1370–1377. [Google Scholar]
Lee, G.-G.; Park, B.K. Traffic Light Recognition Using Deep Neural Networks. In Proceedings of the 2017 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 8–10 January 2017; pp. 277–278. [Google Scholar]
Bach, M.; Stumper, D.; Dietmayer, K. Deep Convolutional Traffic Light Recognition for Automated Driving. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 851–858. [Google Scholar]
Kim, H.-K.; Park, J.H.; Jung, H.-Y. An Efficient Color Space for Deep-Learning Based Traffic Light Recognition. J. Adv. Transp. 2018, 2018, e2365414. [Google Scholar] [CrossRef]
Müller, J.; Dietmayer, K. Detecting Traffic Lights by Single Shot Detection. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 266–273. [Google Scholar]
Gupta, A.; Choudhary, A. A Framework for Traffic Light Detection and Recognition Using Deep Learning and Grassmann Manifolds. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France, 9–12 June 2019; pp. 600–605. [Google Scholar]
Vitas, D.; Tomic, M.; Burul, M. Traffic Light Detection in Autonomous Driving Systems. IEEE Consum. Electron. Mag. 2020, 9, 90–96. [Google Scholar] [CrossRef]
Yeh, T.-W.; Lin, H.-Y.; Chang, C.-C. Traffic Light and Arrow Signal Recognition Based on a Unified Network. Appl. Sci. 2021, 11, 8066. [Google Scholar] [CrossRef]
Kilic, I.; Aydin, G. Traffic Lights Detection and Recognition with New Benchmark Datasets Using Deep Learning and TensorFlow Object Detection API. Trait. Signal 2022, 39, 1673–1683. [Google Scholar] [CrossRef]
Philipsen, M.P.; Jensen, M.B.; Møgelmose, A.; Moeslund, T.B.; Trivedi, M.M. Traffic Light Detection: A Learning Algorithm and Evaluations on Challenging Dataset. In Proceedings of the 2015 IEEE 18th International Conference on Intelligent Transportation Systems, Gran Canaria, Spain, 15–18 September 2015; pp. 2341–2345. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Liu, H.; Yu, Y.; Liu, S.; Wang, W. A Military Object Detection Model of UAV Reconnaissance Image and Feature Visualization. Appl. Sci. 2022, 12, 12236. [Google Scholar] [CrossRef]
Lu, E.H.; Gozdzikiewicz, M.; Chang, K.-H.; Ciou, J.-M. A Hierarchical Approach for Traffic Sign Recognition Based on Shape Detection and Image Classification. Sensors 2022, 22, 4768. [Google Scholar] [CrossRef]
Lv, H.; Yan, H.; Liu, K.; Zhou, Z.; Jing, J. YOLOv5-AC: Attention Mechanism-Based Lightweight YOLOv5 for Track Pedestrian Detection. Sensors 2022, 22, 5903. [Google Scholar] [CrossRef]
Song, W.; Suandi, S.A. TSR-YOLO: A Chinese Traffic Sign Recognition Algorithm for Intelligent Vehicles in Complex Scenes. Sensors 2023, 23, 749. [Google Scholar] [CrossRef]
Chen, X. Traffic Lights Detection Method Based on the Improved YOLOv5 Network. In Proceedings of the 2022 IEEE 4th International Conference on Civil Aviation Safety and Information Technology (ICCASIT), Dali, China, 12–14 October 2022; pp. 1111–1114. [Google Scholar]
Marques, R.; Ribeiro, T.; Lopes, G.; Ribeiro, A. YOLOv3: Traffic Signs & Lights Detection and Recognition for Autonomous Driving. In Proceedings of the 14th International Conference on Agents and Artificial Intelligence, Virtual, 3–5 February 2022; SCITEPRESS-Science and Technology Publications: Online Streaming, 2022; pp. 818–826. [Google Scholar]
Wang, Q.; Zhang, Q.; Liang, X.; Wang, Y.; Zhou, C.; Mikulovich, V.I. Traffic Lights Detection and Recognition Method Based on the Improved YOLOv4 Algorithm. Sensors 2022, 22, 200. [Google Scholar] [CrossRef] [PubMed]
Zhao, Y.; Feng, Y.; Wang, Y.; Zhang, Z.; Zhang, Z. Study on Detection and Recognition of Traffic Lights Based on Improved YOLOv4. Sensors 2022, 22, 7787. [Google Scholar] [CrossRef]
Ji, Y.; Yang, M.; Lu, Z.; Wang, C. Integrating Visual Selective Attention Model with HOG Features for Traffic Light Detection and Recognition. In Proceedings of the 2015 IEEE Intelligent Vehicles Symposium (IV), Seoul, Republic of Korea, 28 June–1 July 2015; pp. 280–285. [Google Scholar]
Shi, X.; Zhao, N.; Xia, Y. Detection and Classification of Traffic Lights for Automated Setup of Road Surveillance Systems. Multimed. Tools Appl. 2016, 75, 12547–12562. [Google Scholar] [CrossRef]
Saini, S.; Nikhil, S.; Konda, K.R.; Bharadwaj, H.S.; Ganeshan, N. An Efficient Vision-Based Traffic Light Detection and State Recognition for Autonomous Vehicles. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 606–611. [Google Scholar]
Shen, X.; Andersen, H.; Ang, M.H.; Rus, D. A Hybrid Approach of Candidate Region Extraction for Robust Traffic Light Recognition. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; pp. 1–8. [Google Scholar]
Wang, W.; Sun, S.; Jiang, M.; Yan, Y.; Chen, X. Traffic Lights Detection and Recognition Based on Multi-Feature Fusion. Multimed. Tools Appl. 2017, 76, 14829–14846. [Google Scholar] [CrossRef]
Wang, X.; Jiang, T.; Xie, Y. A Method of Traffic Light Status Recognition Based on Deep Learning. In Proceedings of the 2018 International Conference on Robotics, Control and Automation Engineering, Beijing, China, 26–28 December 2018; pp. 166–170. [Google Scholar] [CrossRef]
Kim, H.-K.; Yoo, K.-Y.; Park, J.H.; Jung, H.-Y. Traffic Light Recognition Based on Binary Semantic Segmentation Network. Sensors 2019, 19, 1700. [Google Scholar] [CrossRef]
Gao, F.; Wang, C. Hybrid Strategy for Traffic Light Detection by Combining Classical and Self-Learning Detectors. IET Intell. Transp. Syst. 2020, 14, 735–741. [Google Scholar] [CrossRef]
Masaki, S.; Hirakawa, T.; Yamashita, T.; Fujiyoshi, H. Distant Traffic Light Recognition Using Semantic Segmentation. Transp. Res. Rec. 2021, 2675, 97–103. [Google Scholar] [CrossRef]
Niu, C.; Li, K. Traffic Light Detection and Recognition Method Based on YOLOv5s and AlexNet. Appl. Sci. 2022, 12, 10808. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. arXiv 2016, arXiv:1612.08242. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
Jocher, G. YOLOv5 by Ultralytics 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 18 May 2020).
Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. arXiv 2019, arXiv:1905.02244. [Google Scholar] [CrossRef]
Jayadeva; Khemchandani, R.; Chandra, S. Twin Support Vector Machines for Pattern Classification. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 905–910. [Google Scholar] [CrossRef]
Ramachandran, P.; Zoph, B.; Le, Q.V. Searching for Activation Functions. arXiv 2017, arXiv:1710.05941. [Google Scholar] [CrossRef]
Tomar, D.; Agarwal, S. A Comparison on Multi-Class Classification Methods Based on Least Squares Twin Support Vector Machine. Knowl.-Based Syst. 2015, 81, 131–147. [Google Scholar] [CrossRef]

Figure 1. Network structure of SA mechanism.

Figure 2. Comparison of the proposed Mul-FPN + PAN network and the original FPN + PAN network.

Figure 3. Add module.

Figure 4. MultiBottleneckCSP module.

Figure 5. Down-sampling module.

Figure 6. Structure of CBH convolution block.

Figure 7. Structure of proposed lightweight YOLOv5s-based traffic light detection method.

Figure 8. Overall workflow of the proposed TLDR method.

Figure 9. Integrated experimental vehicle and its main equipment (the used ones are marked in red).

Figure 10. Example images of the used dataset.

Table 1. Approaches employed in the physical features-based and machine learning approaches-based TLDR methods.

References	Kinds	Specific Approaches
[1]	Physical-features-based	Color-based, RGB color space
[2]		Color-based, RGB color space
[3]		Color-based, normalized RGB color space
[4]		Shape and color-based, HSI and RGB color space
[5]		Shape and color-based, HSI color space
[6]		Color-based, HSV color space
[7]	Machine-learning-based approaches	CNN-based, GPS data involved
[8]		Multi-ANN-based
[9]		DNN-based
[10]		Faster-R-CNN-based
[11]		Multi-networks-based
[12]		SSD-algorithm-based
[13]		Faster-R-CNN-based
[14]		CNN-based
[15]		CNN-based, binocular vision involved
[16]		Faster-R-CNN-based

Table 2. Methods used in the two stages of TLDR.

References	First Stage: Detection	Second Stage: Recognition
[27]	Visual selective attention model	Support vector machines (SVMs)
[28]	HSV color-space-based	SVM
[29]	HSV color space with maximally stable extremal region	CNN
[30]	RGB color-space-based	SVM
[31]	HSV color space with Otsu algorithm	SVM
[32]	YOLOv3	Lightweight CNN
[33]	Binary semantic segmentation network	Fully convolutional network (FCN)
[34]	HSV color-space-based	CNN
[35]	Semantic segmentation method	CNN
[36]	YOLOv5s	Alexnet

Table 3. Structure of MobileNetV3-Large Network integrating SA Mechanism.

Input	Operator	Exp	Out	SA	NL	s
6402 × 3	Conv, 3 × 3	16	16	-	ReLU	2
3202 × 16	Bneck, 3 × 3	16	16	-	ReLU	1
3202 × 16	Bneck, 3 × 3	64	24	-	ReLU	2
1602 × 24	Bneck, 3 × 3	72	24	-	ReLU	1
1602 × 16	Bneck, 5 × 5	72	40	√	ReLU	2
802 × 40	Bneck, 5 × 5	120	40	√	ReLU	1
802 × 40	Bneck, 5 × 5	120	40	√	ReLU	1
802 × 40	Bneck, 3 × 3	240	80	-	H-swish	2
402 × 80	Bneck, 3 × 3	200	80	-	H-swish	1
402 × 80	Bneck, 3 × 3	184	80	-	H-swish	1
402 × 80	Bneck, 3 × 3	184	80	-	H-swish	1
402 × 80	Bneck, 3 × 3	480	112	√	H-swish	1
402 × 112	Bneck, 3 × 3	672	112	√	H-swish	1
402 × 112	Bneck, 5 × 5	672	160	√	H-swish	1
402 × 160	Bneck, 5 × 5	672	160	√	H-swish	2
202 × 160	Bneck, 5 × 5	960	160	√	H-swish	1

Table 4. H, S, and V thresholds of basic colors.

	Black	Grey	White	Red1	Red2	Orange	Yellow	Green	Cyan	Blue	Purple
$H_{\min}$	0	0	0	0	156	11	26	35	78	100	125
$H_{\max}$	180	180	180	10	180	25	34	77	99	124	155
$S_{\min}$	0	0	0	43	43	43	43	43	43	43	43
$S_{\max}$	255	43	30	255	255	255	255	255	255	255	255
$V_{\min}$	0	46	221	46	46	46	46	46	46	46	46
$V_{\max}$	46	220	255	255	255	255	255	255	255	255	255

Table 5. Detailed information on the first kind of platform.

Part	Details
Hardware	Intel Core i7-10700 CPU with 32 GB System Memory and Nvidia RTX3050 GPU with 8 GB Video Memory
Software	Ubuntu 18.04 OS, Python 3.8, Pytorch 1.8, CUDA 11.1, and CUDNN 8.0.5.39

Table 6. Results of ablation experiments.

Experiments Number	$mAP$ (%)	$P$ (%)	$R$ (%)	Model Size (MB)	Computation Quantity (GFLOPs)
1	95.30	91.12	92.72	7.20	16.50
2	93.12	83.4	90.50	1.58	2.46
3	94.90	84.41	93.90	1.53	2.39
4	95.85	86.16	94.82	1.67	2.75
5	95.90	86.40	94.92	1.76	3.20
6	95.90	88.40	94.72	1.82	2.97
7	96.40	91.07	95.27	1.92	3.43

Table 7. Evaluation results of recognition method of traffic light’s color.

Type of Color	Number of Samples	Number of Correct Recognition	Accuracy (%)
Red Light	5100	5023	98.5
Green Light	6790	6692	98.6
Yellow Light	447	434	97.1
Total	12,337	12,162	98.6

Table 8. Evaluation results of recognition method of traffic light’s shape.

Type of Shape	$P$	$R$	$F 1 - Score$	Number of Samples
Round	0.98	0.97	0.98	571
Left arrow	0.99	0.98	0.98	547
Up arrow	0.98	0.99	0.99	579
Right arrow	0.98	0.99	0.99	553

Table 9. Comparison results between the original YOLOv5s model and our method.

	$mAP$ (%)	$P$ (%)	$R$ (%)
Original YOLOv5s model	63.51	61.41	57.72
Our method	92.35	87.72	91.45

Table 10. Comparison results between typical two-stage TLDR methods and our method.

	Detection (%)	Recognition (%)
[31]	97.45	97.05
[32]	97	95
[34]	87.5	79.6
[36]	99.39	87.8
Our method	92.35	91.45

Table 11. Comparison results of the feasibility of deployment.

	Platform	Maximum Computing Power (TFLOPs)	Power Consumption (W)	Running Speed (Frames/s)
[24] (YOLOv3)	GTX1050	1.73	75	2
[24] (YOLOv3-tiny)	GTX1050	1.73	75	17
[25]	GTX1080Ti	10.6	250	29
[26]	RTX3090	35.7	350	31.55
Our Method	RTX3050	4.33	95	32
Our Method	Jetson Nano	0.5	5	13

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, X.; Han, J.; Xiang, H.; Wang, B.; Wang, G.; Shi, H.; Chen, L.; Wang, Q. A Lightweight Traffic Lights Detection and Recognition Method for Mobile Platform. Drones 2023, 7, 293. https://doi.org/10.3390/drones7050293

AMA Style

Wang X, Han J, Xiang H, Wang B, Wang G, Shi H, Chen L, Wang Q. A Lightweight Traffic Lights Detection and Recognition Method for Mobile Platform. Drones. 2023; 7(5):293. https://doi.org/10.3390/drones7050293

Chicago/Turabian Style

Wang, Xiaoyuan, Junyan Han, Hui Xiang, Bin Wang, Gang Wang, Huili Shi, Longfei Chen, and Quanzheng Wang. 2023. "A Lightweight Traffic Lights Detection and Recognition Method for Mobile Platform" Drones 7, no. 5: 293. https://doi.org/10.3390/drones7050293

Article Menu

A Lightweight Traffic Lights Detection and Recognition Method for Mobile Platform

Abstract

1. Introduction

2. Methods and Materials

2.1. Detection Method Based on a Lightweight YOLOv5s Model

2.1.1. Improvements in the Backbone

2.1.2. Improvements in the Neck

2.2. Recognition Method Based on HSV Color Space and Extended TWSVM Algorithm

2.2.1. Recognition Method of Traffic Light’s Color

2.2.2. Recognition Method of Traffic Light’s Shape

2.3. Platforms and Data

2.3.1. Data

2.3.2. Platform

3. Results

3.1. Evaluation Index

3.2. Evaluation of Detection Method of Traffic Lights

3.3. Evaluation of Recognition Method of Traffic Lights

3.4. Comprehensive Evaluation

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI