The Verification of the Correct Visibility of Horizontal Road Signs Using Deep Learning and Computer Vision

Kulawik, Joanna; Kubanek, Mariusz; Garus, Sebastian

doi:10.3390/app132011489

Open AccessArticle

The Verification of the Correct Visibility of Horizontal Road Signs Using Deep Learning and Computer Vision

by

Joanna Kulawik

^1,*

,

Mariusz Kubanek

¹

and

Sebastian Garus

²

¹

Faculty of Mechanical Engineering and Computer Science, Department of Computer Sciences, Czestochowa University of Technology, Dabrowskiego 73, 42-201 Czestochowa, Poland

²

Faculty of Mechanical Engineering and Computer Science, Department of Mechanics and Fundamentals of Machine Design, Czestochowa University of Technology, Dabrowskiego 73, 42-201 Czestochowa, Poland

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(20), 11489; https://doi.org/10.3390/app132011489

Submission received: 21 August 2023 / Revised: 15 October 2023 / Accepted: 18 October 2023 / Published: 20 October 2023

Download

Browse Figures

Versions Notes

Abstract

:

This research aimed to develop a system for classifying horizontal road signs as correct or with poor visibility. In Poland, road markings are applied by using a specialized white, reflective paint and require periodic repainting. Our developed system is designed to assist in the decision-making process regarding the need for repainting. It operates by analyzing images captured by a standard car camera or driving recorder. The image data undergo initial segmentation and classification processes, facilitated by the utilization of the YOLOv4-Tiny neural network model. The input data to the network consist of frames extracted from the video stream. To train the model, we established our proprietary database, which comprises 6250 annotated images and video frames captured during driving. The annotations provide detailed information about object types, their locations within the image, and their sizes. The trained neural network model effectively identifies and classifies objects within our dataset. Subsequently, based on the classification results, the identified image fragments are subjected to further analysis. The analysis relies on assessing pixel-level contrasts within the images. Notably, the road surface is intentionally designed to be dark, while road signs exhibit relatively lighter colors. In conclusion, the developed system serves the purpose of determining the correctness or visibility quality of horizontal road signs. It achieves this by leveraging computer vision techniques, deep learning with YOLOv4-Tiny, and a meticulously curated database. Ultimately, the system provides valuable information regarding the condition of specific horizontal road signs, aiding in the decision-making process regarding potential repainting needs.

Keywords:

road sign classification; YOLOv4-Tiny; image analysis; deep learning

1. Introduction

Road infrastructure plays an essential role in any nation’s socioeconomic development. In Poland, as in many countries, the importance of road signs—both vertical and horizontal—cannot be overstated. These signs not only guide the myriad of road users but, more importantly, ensure their safety. However, while much emphasis is often placed on vertical signs, the relevance of horizontal road signs is sometimes understated, even though they bear equal significance.

Horizontal road signs in Poland, characterized by their light-colored designs, have been strategically chosen to contrast the naturally dark road surfaces, enhancing their visibility. This contrast ensures visibility for road users, especially for signs primarily in white hues. Moreover, the paint applied to these signs contains reflective components, enhancing visibility, particularly in adverse weather conditions and during nighttime.

The effectiveness of horizontal road signs hinges on their unobstructed visibility. Obstacles to their visibility can be categorized into two types. The first type arises from obstructions caused by other road users, objects, or adverse weather conditions such as rain or snow. The second reason for reduced visibility is the natural wear and tear of the signs, leading to paint deterioration. Repainting decisions are made by relevant road authorities. In the realm of damage-recovery planning, diverse solutions are employed to optimize each stage of the process [1]. Collaboration among multiple maintenance teams proves invaluable in establishing a sustainable approach to selective maintenance sequence planning [2].

Experienced drivers can intuitively recognize partially obscured signs. However, this demands sustained attention and complicates interpretation, posing an unnecessary safety risk. Severely worn signs become unrecognizable, posing a serious threat. Hence, the concept of developing an integrated system for detecting horizontal road signs and assisting in the decision to repaint them arose.

In recent years, technological advancements, particularly in the realm of computer vision and artificial intelligence, have opened new avenues to address challenges that were traditionally tackled manually. Given that road signs are specifically designed for human visual perception, it is logical to consider computer vision as a solution to detect and assess the state of these signs, especially when manual inspections can be resource intensive and less efficient.

In this paper, we explore the potential of leveraging state-of-the-art computer vision techniques, backed by robust convolutional neural networks, to detect, analyze, and subsequently decide on the maintenance needs of horizontal road signs in Poland. We aim to bridge the gap between advanced technological solutions and traditional infrastructure challenges. Our research not only contributes a potentially scalable solution to Poland’s road safety but also offers insights that could be beneficial for similar challenges in other parts of the world.

In the latest research in the field of computer vision analysis, artificial intelligence is successfully used. The convolutional models are selected depending on the purpose and the available dataset. Regression models are used for issues related to value estimation [3,4]. There are many models for detecting objects [5,6,7]. Research on the classification of objects is popular [8,9,10]. Image segmentation is also widely researched [11,12,13,14]. Some solutions combine image segmentation with the simultaneous classification of found objects. Good examples are popular YOLO models. At the moment, they already exist in many variants: YOLOv1 [15], YOLOv2 [16], YOLOv3 [17], and YOLOv4 [18]. Each subsequent version is better than the previous one, whether in terms of accuracy or speed. The YOLOv4-Tiny model [19,20,21] seemed to be the most appropriate for the studies presented in this article. It is a modification of YOLOv4, thanks to which it loses a bit in detection quality but significantly gains in speed. The research and testing of the results proved that this was the right model.

The decision-making system relies on data extracted from a digital image, employing algorithms tailored for image processing and analysis. Various techniques and their diverse configurations are employed, as referenced in prior research [22,23]. Among these, thresholding with binarization [24,25] and histogram analysis [26,27] represent the most classical and enduring methods, known for their simplicity and longevity in practical application.

The research detailed in this article was structured into distinct stages. The initial stage involved the creation of an appropriate dataset. Subsequently, the second stage encompassed the development and training of a suitable convolutional neural network (CNN) model. The third stage focused on the construction of the decision-making system. Finally, the last stage entailed rigorous testing and a comprehensive analysis of the results obtained.

The diagram in Figure 1 illustrates our proposed system architecture for verifying the correct visibility of horizontal road signs. It is a hybrid combination of the YOLOv4-Tiny model and our decision-making system.

2. Dataset

The data source for the research comprises video streams. These streams were captured by using a LAMAX camera, specifically the X9.1 model, while the vehicle was in motion. The camera was affixed to the rearview mirror at the front of the vehicle. Video streams were recorded in the .mp4 file format, with a frame rate of 30 frames per second (fps). Images were acquired on both sunny and cloudy days. Thanks to this, two types of weather conditions were taken into account. Both groups of images have a similar number in the dataset.

The initial step involved the division of the streams into frames, each with a resolution of 1920 × 1080 pixels. Subsequently, these frames underwent object annotation, focusing on horizontal road signs. Nine fundamental classes were established, corresponding to the predominant shapes of common horizontal road signs in Poland. These classes encompassed pedestrian crossings, right arrows, left arrows, straight arrows, right–left–straight arrow combinations, right–straight arrow combinations, left–straight arrow combinations, left–diagonal arrow combinations, and right–diagonal arrow combinations. A visual representation of such objects can be found in Figure 2.

The annotation process entailed defining Regions of Interest (ROI) within the images and assigning appropriate labels to these regions. It was possible for a single image to contain multiple objects, and instances of the same object could also appear within a single image. The “LabelImg” software (1.8.0) was employed for this annotation task. Figure 3 provides an example of an annotated image. In total, 6250 images (frames) were meticulously annotated. Additionally, a text file listing the names of all defined object classes was generated. Each image had a corresponding text file containing structured information detailing the presence, type, location, and dimensions of all the identified objects.

The constructed dataset was randomly partitioned into two subsets, each comprising labeled images: a training set comprising 5000 samples and a test set comprising 1250 samples. Table 1 provides a breakdown of the sample distribution across each class within both the training and test sets.

3. Convolutional Neural Networks

In the initial phase of the research, the CNN classifier was employed. Its role was to segment the image while simultaneously categorizing the identified objects and determining their precise positions on the image surface. We opted to use the well-known YOLOv4 - Tiny model for this purpose. The original version of this model is designed to detect and classify 80 different types of objects, none of which match our defined nine classes. As a result, we tailored the original model to suit our specific set of classes. This customization involved resizing the classes, allowing for the calculation of the probability of belonging to a given class within a set of nine classes as the output data. Subsequently, we conducted further training by using our prepared dataset (training set). The input data consisted of color images with dimensions of 416 × 416 pixels. We employed supervised learning, and the model was trained over the course of 50 epochs.

When configuring a machine learning model, precise parameter settings are essential to achieve the desired results. In the context of YOLOv4-Tiny or similar models for object detection, the following parameters and their values significantly influence the training process:

Anchor sizes, defined as (10,14, 23,27, 37,58, 81,82, 135,169, 344,319) values, serve as reference points for object detection across various scales. These anchor sizes are specified as width–height pairs and are vital for accurately localizing objects of different dimensions. Strides parameter, defined as (16, 32), specifies spatial intervals, or strides, between anchor points at different scales. These strides are instrumental in ensuring that the model can effectively detect objects at various scales within an image. XY scaling factors represent fine-tuning predictions. These factors, such as (1.2, 1.1, 1.05), help adapt the model’s predictions to account for variations in object sizes across the image. The anchor count per scale is defined as three. In this case, three anchors are designated per scale. This parameter plays a role in how the model approaches object detection at different spatial resolutions. The learning rate is a crucial hyperparameter for training machine learning models. The initial learning rate has been set to 0.001 (1 × 10

^{- 3}

) while gradually decreasing it to 0.000001 (1 × 10

^{- 6}

) over the course of training. Data augmentation introduces variations into the training data, enhancing the model’s ability to handle diverse scenarios. We specified a warm-up phase consisting of two epochs. During this phase, the learning rate is adjusted gradually, facilitating smoother model convergence. Training occurs in two stages. The first stage spans 20 epochs while the second stage comprises 30 epochs. These stages guide the model through distinct training phases, potentially refining its performance. The Intersection over Union (IoU) Loss Threshold sets the threshold to 0.5. The IoU measures the overlap between predicted and actual object locations. Predictions with the IoU below this threshold result in increased training losses.

In summary, these parameters and their values are pivotal in shaping the training process of the YOLOv4-Tiny model. They influence anchor selection, scale adaptation, learning rate control, data augmentation, and the training strategy, all of which are critical for robust and accurate object detection in machine learning applications.

The output data obtained from the predictions include the object’s location on the image surface, its size, and its assignment to one of the nine defined classes. For each identified object in the analyzed image, the CNN furnishes this information. The number of objects can vary in different images and depends on how many objects the model detects within a given image.

The implementation was developed in Python version 3.9. Using the Tensorflow library was a great help in preparing the implementation. The Adam optimizer, famous in classification issues, was used. The tests were performed on a workstation equipped with an MD Ryzen 9 3950X 16-Core @3.5GHz processor, 32 GB of RAM, and Windows 10 Pro operating system.

The learned CNN model was tested on real-time video streams. Outstanding results of the model operation were obtained, primarily a relatively fast (average 15 fps) prediction speed and a high accuracy in finding objects in the image and their classification. However, this way of verifying the correct operation of the network is highly subjective. Therefore, to obtain measurable results, the tests were also carried out on a test set of images (frames of video streams). This way, it was possible to compare the predicted data with the actual data.

The object search accuracy was 96.79% for the prepared test set. It should be emphasized that these were images that did not participate in the learning process of this model. Figure 4 shows an exemplary result of the learned model. The figure demonstrates the presentation of results by the utilized model. Detected objects are outlined with colored bounding boxes, each accompanied by a label specifying the object’s class. Objects belonging to the same class are marked with identical box colors, resulting in a maximum of nine distinct colors that can appear in a single image.

Next, the metrics suitable for classification were calculated. For this purpose, a confusion matrix was created (Table 2), which allowed us to categorize objects into the following sets: TP—true positive classification (correctly classified) objects, FP—false positive classification (found but incorrectly classified) objects, TN—true negative classification objects, and FN—false negative classification (found and incorrectly not classified) objects.

Based on the prediction results classified according to the adopted confusion matrix, metrics [28,29] were separately calculated for each class. All the metrics listed here assume values within the range of <0.1>. The first Equation (1) was employed to compute the accuracy (A), providing information about the degree of correct responses across the entire set of results:

A = \frac{T P + T N}{T P + F P + T N + F N} .

(1)

The second Equation (2) enables the calculation of recall (R). The closer its value is to 1, the more sensitive the model is:

R = \frac{T P}{T P + F N} .

(2)

Another metric, specificity (S), was computed according to Equation (3). The closer the value is to one, the better the model at preventing false positives:

S = \frac{T P}{F P + T N} .

(3)

The next metric, calculated by using Equation (4), is precision (P), which quantifies the value of an optimistic forecast; the higher its value, the greater the precision of the network model:

P = \frac{T P}{T P + F P} .

(4)

The final computed metric is the model quality indicator, the F1 score, which is calculated based on Equation (5):

F 1 = \frac{2 * T P}{2 * T P + F P + F N} .

(5)

The results are presented in Table 3, confirming the achievement of excellent test results. The utilization of artificial intelligence in the form of the YOLOv4-Tiny model allows for relatively rapid image segmentation and the classification of separated objects with a high degree of accuracy.

A brief analysis of the results from the table is as follows:

Accuracy (A), for most classes, is exceptionally high, exceeding 99%. This indicates the model’s proficiency in object classification.
Recall (R) reflects the model’s ability to detect positive instances. Most classes exhibit a high recall, signifying the model’s effectiveness in detecting objects of these classes.
Specificity (S) measures the model’s capability to avoid false positives. Specificity values are also generally high for most classes.
Precision (P) assesses the accuracy of positive predictions. Precision values are typically high, albeit slightly lower for certain classes.
The F1 score is a metric that takes into account both precision and recall. F1 score values are generally high, indicating an overall good classification quality.

Overall, the model demonstrates an excellent performance in classifying all classes, with high values for accuracy, recall, specificity, precision, and the F1 score. It is worth noting that Class 5 achieves perfect scores (1.00000) in all metrics.

4. Development of a Decision-Making System

Following the accurate detection and classification of objects within the digital image, the subsequent phase of this research entailed the development of a decision-making system. The primary objective of this system is to facilitate the categorization of these objects, specifically horizontal road signs, into two distinct groups: signs necessitating repainting and signs not requiring any repainting (due to sufficient visibility). Segments of the images corresponding to one of the nine predefined horizontal road signs were extracted based on the data acquired from the detection and classification processes. These segments were treated as individual elements and underwent further transformations.

The initial step involved establishing criteria for determining when a given sign is adequately visible and when its visibility falls below the required threshold. Figure 2 provides examples of signs that are adequately visible and those that are partially not visible. Both categories of objects underwent various transformations within the realm of digital image analysis. It is worth noting that not all techniques yielded significant insights; therefore, only the most intriguing observation will be presented.

The most intriguing results were obtained from a series of transformations as follows: Firstly, the image was converted to grayscale. Subsequently, the contrast of this image was increased. Next, its histogram was computed.

Before the transformations, the histograms for both well-visible objects and those requiring repainting exhibited very similar trends. However, after the aforementioned transformations, a specific difference in the histogram function was observed. The histogram in the range of maximum saturation values (close to 255) for well-visible objects displayed high values, while objects requiring repainting showed a declining trend in this range.

Figure 5 depicts a sample set of graphs for a properly visible horizontal road sign. Figure 5a represents the image with the object, Figure 5b shows the histogram plot of that image, and Figure 5c presents the histogram plot for the image after a series of transformations were applied.

Figure 6 presents a similar set of graphs for a poorly visible horizontal road sign. Figure 6a represents the image with the object, Figure 6b shows the histogram plot of that image, and Figure 6c presents the histogram plot for the image after a series of transformations were applied.

It is evident that increasing the contrast enhances the saturation of bright colors in the image for properly visible objects. However, performing the same operation on images with incomplete objects increases the saturation only to a limited extent. This is an effect resulting from the deficiency of white paint (where increasing the contrast brightens gray, albeit to a lesser extent than light gray/almost white). However, when certain parts of an object, such as its edges, are not visible, the object occupies fewer pixels. Consequently, the ratio of object pixels to background pixels is lower than in images featuring fully visible objects. To address this, a series of three operations were conducted on all the analyzed image fragments: conversion to grayscale, contrast enhancement, and histogram calculation.

Establishing a threshold value for the decision-making system was imperative. This task involved the scrutiny of 1000 independent images containing objects. The resultant analysis led to the determination of a threshold value of 225 for subsequent research endeavors. It was also imperative to account for variations in image dimensions and brightness levels. Consequently, the decision-making system relies upon the relationship elucidated by Equation (6):

B S = \frac{\sum h i s t [225 : 255]}{\sum h i s t [0 : 255]},

(6)

where

B S

is the share of brightness;

\sum h i s t [225 : 255]

is the sum of the histogram values from 225 to 255; and

\sum h i s t [0 : 255]

is the sum of all the values of this histogram.

Subsequently, the threshold value for the brightness share (

B S

) for each road sign class had to be ascertained. A dataset of 1000 independent images containing objects was repurposed for this purpose. Upon comprehensive analysis, it was discerned that for “Class 1”, such a threshold was estimated to be approximately

B S = 10

. In contrast, the disparities for the remaining eight classes were relatively minimal, hovering around

B S = 4

. This variance stems from the distinctive characteristics of the signs; the pedestrian crossing sign is distinguished by numerous bright/white lines, while the remaining signs within the Region of Interest (ROI) encompass a substantial proportion of background/road pixels.

In the final stage, we developed an implementation of a decision-making system that relies on pre-established thresholds. After obtaining prediction results from the YOLOv4-Tiny model for a given image, we proceed to the subsequent stages of our research.

Firstly, the entire image is transformed into grayscale and then subjected to a contrast-enhancement operation. Subsequently, based on the information about the position and size of the Region of Interest (ROI) obtained from the CNN model’s predictions, all the detected and classified objects within the image are extracted.

The next step involves repeating an identical process for each extracted image fragment individually. This process entails computing the histogram function, which represents the distribution of pixel brightness within the respective image fragment. Subsequently, based on Equation (6), we calculate the value of the

B S

(Brightness Share), which serves as a measure of the brightness contribution within the image fragment.

In the last step, we conduct a conditional process that takes into consideration both the

B S

value and the class to which each object belongs. This step is crucial as it enables us to make the ultimate assessment of whether a given horizontal road sign requires repainting or remains adequately visible, thus not necessitating intervention. Thanks to this advanced analytical scheme, we gain valuable insights into the condition of each road sign in the image, which is of utmost importance in the management and maintenance of road signs.

As the ultimate output of our system, we generate an image where horizontal road signs are delineated with a bounding box, accompanied by information about the object’s classification (i.e., to which class the object has been classified) and the determination of whether it necessitates repainting (as determined by the decision-making system).

The developed system was also tested on real-time video streams captured by the camera. The outcomes were highly promising. However, for a comprehensive analysis of our system’s performance, real-world data are indispensable. These data allow for comparisons with the output generated by the CNN model, enabling the calculation of relevant metrics. Unfortunately, we cannot obtain them in real time. Therefore, in this context, detailed discussion regarding the video stream is omitted, and our analyses are presented by using established examples.

5. Testing and Analysis of the Obtained Results

In the final stage of our research, it was necessary to test the developed decision-making system. To achieve this, we utilized a dataset containing 2634 various objects. This dataset served as a collection of objects that had been identified and classified based on the prediction results of the CNN model, which was evaluated on a previously prepared dataset designed for testing purposes.

The results generated by our decision-making system categorized 1645 objects as requiring urgent repainting. Examples of such road signs can be observed in Figure 7. The remaining 989 elements were designated as objects that remain in a satisfactory visibility condition and do not require immediate intervention. Figure 8 showcases examples of road signs that still maintain an adequately visible state and do not necessitate repainting.

Upon meticulous analysis of both groups of objects, we can confirm that the individual horizontal road signs were correctly assigned to their respective categories. This represents a crucial step in the process of managing and maintaining road signs.

To ensure the reliability of our research, it is essential to note that during the third stage of our work (the decision-making process), only objects correctly classified in the earlier stages were considered. Objects not identified during the image-segmentation process were naturally excluded from the assessment of correct visibility. It is crucial to emphasize that this exclusion constituted only a negligible percentage of the total number of objects, directly reflecting the accuracy of the CNN model.

The performance of the YOLOv4-Tiny classifier is outstanding. This model proficiently detects objects on the image surface and accurately classifies them, even when dealing with road signs of various sizes. Some of these signs are exceptionally small; nonetheless, the CNN model exhibits remarkable precision in their classification. Although we acknowledge the availability of alternative CNN models for segmentation and classification, including newer versions of the YOLO model, we firmly maintain that the YOLOv4-Tiny model stands as the optimal choice from our perspective.

The developed decision-making system represents a wholly original concept, devised based on the intricacies of digital image analysis and processing. While an alternative approach could involve the creation of another network model, the extensive work required for its development and the necessity of preparing a suitably representative dataset led us to our chosen solution. We firmly believe that our system yields excellent results and stands as an effective solution to the identified problem.

6. Conclusions

Our research yielded excellent results. The applied adaptation and training of the YOLOv4-Tiny model allowed for the detection of horizontal road signs with an accuracy of 96.79%. The classification of these objects into one of the nine possible classes produced correct results, with a high degree of accuracy exceeding 98% in each case. The results are also characterized by high convergence and repeatability. Calculated metrics confirm the high accuracy of the obtained results. The developed decision-support system for assessing the need to repaint poorly visible horizontal road signs delivered excellent results. This research was conducted by taking into account various image sizes and different weather conditions, including both sunny and cloudy days. In each case, outstanding results were achieved.

In the context of conclusions, our research demonstrates that the developed decision system is an effective tool for the management and maintenance of horizontal road signs. This is significant as horizontal road signs play a crucial role in road safety. Our future work will focus on expanding the developed system to include other road signs and addressing more complex road conditions. Our work aims to further enhance systems for the management and maintenance of road signs, ultimately contributing to increased road safety.

Author Contributions

Conceptualization, J.K., M.K. and S.G.; methodology, J.K.; software, J.K. and M.K.; validation, S.G.; formal analysis, M.K.; resources, J.K.; data curation, J.K.; writing—original draft preparation, J.K.; writing—review and editing, J.K.; visualization, J.K.; supervision, M.K.; funding acquisition, J.K. and M.K. All authors have read and agreed to the published version of the manuscript.

Funding

The project financed under the program of the Polish Minister of Science and Higher Education under the name “Regional Initiative of Excellence” in the years 2019–2023 project number 020/RID/2018/19 the amount of financing PLN 12,000,000.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data is not publicly available for reasons of personal data protection.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, W.; Tian, G.; Zhang, H.; Li, Z.; Zhang, L. DDTree: A hybrid genetic algorithm with multiple decoding methods for energy-aware remanufacturing system scheduling problem. Robot.-Comput.-Integr. Manuf. 2023, 81, 102509. [Google Scholar] [CrossRef]
Tian, G.; Zhang, L.; Fathollahi-Fard, A.M.; Kang, Q.; Li, Z.; Wong, K.Y. Addressing a collaborative maintenance planning using multiple operators by a multi-objective Metaheuristic algorithm. IEEE Trans. Autom. Sci. Eng. 2023, 1–13. [Google Scholar] [CrossRef]
Yang, F.; Qiao, Y.; Wei, W.; Wang, X.; Wan, D.; Damaševičius, R.; Woźniak, M. DDTree: A Hybrid Deep Learning Model for Real-Time Waterway Depth Prediction and Smart Navigation. Appl. Sci. 2020, 10, 2770. [Google Scholar] [CrossRef]
Kulawik, J. Estimating the distance to an object from grayscale stereo images using deep learning. J. Appl. Math. Comput. Mech. 2022, 21, 60–72. [Google Scholar] [CrossRef]
Xiao, Y.; Zhou, K.; Cui, G.; Jia, L.; Fang, Z.; Yang, X.; Xia, Q. Deep learning for occluded and multi-scale pedestrian detection: A review. IET Image Process. 2021, 15, 286–301. [Google Scholar] [CrossRef]
Diwan, H. Development of an Obstacle Detection and Navigation System for Autonomous Powered Wheelchairs; University of Ontario Institute of Technology: Oshawa, ON, Canada, 2019. [Google Scholar]
Tian, D.; Han, Y.; Wang, B.; Guan, T.; Wei, W. A review of intelligent driving pedestrian detection based on deep learning. Comput. Intell. Neurosci. 2021, 2021, 5410049. [Google Scholar] [CrossRef] [PubMed]
Ashwini, K.; PM, D.R.V.; Srinivasan, K.; Chang, C.Y. Deep convolutional neural network based feature extraction with optimized machine learning classifier in infant cry classification. In Proceedings of the 2020 International Conference on Decision Aid Sciences and Application (DASA), Sakheer, Bahrain, 8–9 November 2020; IEEE: New York, NY, USA; pp. 27–32. [Google Scholar] [CrossRef]
Szmurło, R.; Osowski, S. Ensemble of classifiers based on CNN for increasing generalization ability in face image recognition. Bull. Pol. Acad. Sci. Tech. Sci. 2022, 70, e141004. [Google Scholar] [CrossRef]
Kulawik, J.; Kubanek, M. Detection of False Synchronization of Stereo Image Transmission Using a Convolutional Neural Network. Symmetry 2021, 13, 78. [Google Scholar] [CrossRef]
Zhou, C.; Wu, M.; Lam, S.K. SSA-CNN: Semantic self-attention CNN for pedestrian detection. arXiv 2019, arXiv:1902.09080. [Google Scholar]
Liu, C.; Chen, L.C.; Schroff, F.; Adam, H.; Hua, W.; Yuille, A.L.; Fei-Fei, L. Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 82–92. [Google Scholar]
Kirillov, A.; Wu, Y.; He, K.; Girshick, R. Pointrend: Image segmentation as rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9799–9808. [Google Scholar]
Yang, M.D.; Tseng, H.H.; Hsu, Y.C.; Tsai, H.P. Semantic segmentation using deep learning with vegetation indices for rice lodging identification in multi-date UAV visible images. Remote Sens. 2020, 12, 633. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–16 July 2017; pp. 7263–7271. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Feng, D.; Xie, J.; Liu, T.; Xu, L.; Guo, J.; Hassan, S.G.; Liu, S. Fry Counting Models Based on Attention Mechanism and YOLOv4-Tiny. IEEE Access 2022, 10, 132363–132375. [Google Scholar] [CrossRef]
Tang, Y.; Zhou, H.; Wang, H.; Zhang, Y. Fruit detection and positioning technology for a Camellia oleifera C. Abel orchard based on improved YOLOv4-tiny model and binocular stereo vision. Expert Syst. Appl. 2023, 211, 118573. [Google Scholar] [CrossRef]
Howell, L.; Anagnostidis, V.; Gielen, F. Multi-Object detector yolov4-tiny enables high-throughput combinatorial and spatially-resolved sorting of cells in microdroplets. Adv. Mater. Technol. 2022, 7, 2101053. [Google Scholar] [CrossRef]
Li, X.; Li, C.; Rahaman, M.M.; Sun, H.; Li, X.; Wu, J.; Yao, Y.; Grzegorzek, M. A comprehensive review of computer-aided whole-slide image analysis: From datasets to feature extraction, segmentation, classification and detection approaches. Artif. Intell. Rev. 2022, 55, 4809–4878. [Google Scholar] [CrossRef]
Kheradmandi, N.; Mehranfar, V. A critical review and comparative study on image segmentation-based techniques for pavement crack detection. Constr. Build. Mater. 2022, 321, 126162. [Google Scholar] [CrossRef]
Pandey, S.; Bharti, J. Review of Different Binarization Techniques Used in Different Areas of Image Analysis. In Evolution in Signal Processing and Telecommunication Networks, Proceedings of Sixth International Conference on Microelectronics, Electromagnetics and Telecommunications (ICMEET 2021), Bhubaneswar, India, 27–28 August 2021; Springer: Singapore, 2022; Volume 2, pp. 249–268. [Google Scholar]
Cheremkhin, P.A.; Kurbatova, E.A.; Evtikhiev, N.N.; Krasnov, V.V.; Rodin, V.G.; Starikov, R.S. Adaptive Digital Hologram Binarization Method Based on Local Thresholding, Block Division and Error Diffusion. J. Imaging 2022, 8, 15. [Google Scholar] [CrossRef] [PubMed]
Hassan, M.; Suhail Shaikh, M.; Jatoi, M.A. Image quality measurement-based comparative analysis of illumination compensation methods for face image normalization. Multimed. Syst. 2022, 28, 511–520. [Google Scholar] [CrossRef]
Dutta, M.K.; Sarkar, R.K. Application of Retinex and histogram equalisation techniques for the restoration of faded and distorted artworks: A comparative analysis. Optik 2023, 272, 170201. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Patterson, J.; Gibson, A. Deep Learning: A Practitioner’s Approach; O’Reilly Media, Inc.: Newton, MA, USA, 2017. [Google Scholar]

Figure 1. The idea of the proposed system for verifying correct visibility of horizontal road signs.

Figure 2. Examples of images containing horizontal road signs from the prepared dataset.

Figure 3. An example image with labeled objects.

Figure 4. An exemplary image outcome of the trained model.

Figure 5. An exemplary set comprising an image and plots for a correctly visible horizontal road sign. (a) Fragment of the original image with the object. (b) Histogram plot of the image from (a) transformed into grayscale. (c) The histogram plot of the image from (a) after a series of transformations.

Figure 6. An exemplary set comprising an image and plots for a poorly visible horizontal road sign. (a) Fragment of the original image with the object. (b) Histogram plot of the image from (a) transformed into grayscale. (c) The histogram plot of the image from (a) after a series of transformations.

Figure 7. Examples of images containing poorly visible objects.

Figure 8. Examples of images containing correctly visible objects.

Table 1. Comparison of the training and test set sizes broken down by individual classes.

Class Number	Name	Number	Number
		Train Set	Test Set
Class 1	pedestrian crossing	2675	671
Class 2	right arrow	1567	390
Class 3	left arrow	1658	424
Class 4	straight arrow	2054	516
Class 5	right–left–straight arrow	198	51
Class 6	right–straight arrow	910	223
Class 7	left–straight arrow	879	218
Class 8	left–diagonal arrow	520	131
Class 9	right–diagonal arrow	223	52

Table 2. An error matrix.

TP	TN
FP	FN

Table 3. A summary of the obtained metric values for individual classes.

	Accuracy (A)	Recall (R)	Specificity (S)	Precision (P)	F1
Class 1	0.99962	1.00000	0.99948	0.99856	0.99928
Class 2	0.99165	0.96134	0.99688	0.98158	0.97135
Class 3	0.99544	0.97761	0.99866	0.99242	0.98496
Class 4	0.98785	0.99799	0.98549	0.94118	0.96875
Class 5	1.00000	1.00000	1.00000	1.00000	1.00000
Class 6	0.99620	0.94975	1.00000	1.00000	0.97423
Class 7	0.99696	0.96903	0.99958	0.99545	0.98206
Class 8	0.99886	0.98462	0.99960	0.99225	0.98842
Class 9	1.00000	1.00000	1.00000	1.00000	1.00000

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kulawik, J.; Kubanek, M.; Garus, S. The Verification of the Correct Visibility of Horizontal Road Signs Using Deep Learning and Computer Vision. Appl. Sci. 2023, 13, 11489. https://doi.org/10.3390/app132011489

AMA Style

Kulawik J, Kubanek M, Garus S. The Verification of the Correct Visibility of Horizontal Road Signs Using Deep Learning and Computer Vision. Applied Sciences. 2023; 13(20):11489. https://doi.org/10.3390/app132011489

Chicago/Turabian Style

Kulawik, Joanna, Mariusz Kubanek, and Sebastian Garus. 2023. "The Verification of the Correct Visibility of Horizontal Road Signs Using Deep Learning and Computer Vision" Applied Sciences 13, no. 20: 11489. https://doi.org/10.3390/app132011489

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Verification of the Correct Visibility of Horizontal Road Signs Using Deep Learning and Computer Vision

Abstract

1. Introduction

2. Dataset

3. Convolutional Neural Networks

4. Development of a Decision-Making System

5. Testing and Analysis of the Obtained Results

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI