Object Recognition Scheme for Digital Transformation in Marine Science and Engineering

Choi, Jinseo; An, Donghyeok; Kang, Donghyun

doi:10.3390/jmse11101914

Open AccessArticle

Object Recognition Scheme for Digital Transformation in Marine Science and Engineering

by

Jinseo Choi

¹

,

Donghyeok An

^2,*

and

Donghyun Kang

^1,*

¹

Department of Computer Engineering, College of IT Convergence, Gachon University, Seongnam-si 13120, Republic of Korea

²

Department of Computer Engineering, Changwon National University, Changwon-si 51140, Republic of Korea

^*

Authors to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2023, 11(10), 1914; https://doi.org/10.3390/jmse11101914

Submission received: 18 September 2023 / Revised: 28 September 2023 / Accepted: 29 September 2023 / Published: 3 October 2023

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

With the advancement of deep learning (DL), researchers and engineers in the marine industry are exploring the application of DL technologies to their specific applications. In general, the accuracy of inference using DL technologies is significantly dependent on the number of training datasets. Unfortunately, people in marine science and engineering environments are often reluctant to share their documents (i.e., P&ID) with third-party manufacturers or public clouds to protect their proprietary information. Despite this, the demand for object detection using DL technologies in image-formatted files (i.e., jpg, png, or pdf format) is steadily growing. In this paper, we propose a new mechanism, called a no-training object picker (NoOP), which efficiently recognizes all objects (e.g., lines, tags, and symbols) in image-formatted P&ID documents. Notably, it can recognize objects without any training dataset, thus reducing the time and effort required for training and collection of unpublished datasets. To clearly present the effectiveness of NoOP, we evaluated NoOP using a real P&ID document. As a result, we confirmed that all objects in the image-formatted P&ID file are successfully detected over a short time (only 7.11 s on average).

Keywords:

deep learning; digital transformation; piping and instrumentation diagram; marine industry; training datasets

1. Introduction

Deep learning (DL) has been widely embraced across various domains; it has become a popular technology due to the advancement of powerful frameworks, such as TensorFlow and Keras [1,2,3,4,5,6,7,8,9,10]. This advancement has opened up new opportunities in marine science and engineering environments, in that DL technologies help to find optimized solutions for existing problems. For example, autonomous ships are beginning to use DL technologies to determine how to run their ships in the ocean with human interaction (i.e., self-driving) [11,12]. Unfortunately, advanced DL technologies in marine science and engineering systems are still in their initial stages because of the complexities inherent in marine programming models.

Meanwhile, image classification and recognition using common DL technologies are projected to serve various marine applications [13,14,15]. For example, some researchers utilized YOLO [7,9] to intelligently track maritime autonomous surface ships [11,12]. There have also been several efforts to apply DL technologies in the manufacturing process to analyze the assembly status of ships, like tankers and container ships, and to transform their piping and instrumentation diagram (P&ID) documents for digitization [13,14,15]. However, some previous studies have shown poor performance or accuracy due to the small dataset and poor resolution; it significantly depends on the quantity and quality of the training dataset. Unfortunately, in marine science and engineering systems, most documents necessary for implementing ships (i.e., P&ID) are stored as low-resolution “.pdf” images, making digital transformation challenging. In addition, to protect proprietary information, these documents are seldom published to third-party manufacturers or public clouds. This scenario necessitates human interaction to create/extend datasets by labeling training data to improve performance and accuracy, demanding significant effort and time. Moreover, advanced smart shipyards will add more complexity to P&ID documents, which will require more human effort and time. These challenges facing marine science and engineering environments motivate us to develop an alternative method to automate the digital transformation mechanism, aiming to reduce human effort and time substantially.

To overcome the above challenges (i.e., small training dataset and human interaction), we designed a new mechanism, denoted as the no-training object picker (NoOP), which efficiently recognizes objects (e.g., lines, tags, and symbols) in P&ID documents using a set of technologies: Tesseract OCR (DL technology) [16,17], Hough Transform [18], and OpenCV [19]. In particular, NoOP perfectly recognizes all objects in P&ID documents without training overhead. Thus, we can save time on labeling data with human interaction as well as on learning the dataset. The recognized data in NoOP are presented in the form of a table for digital transformation, and a graphical model for users to understand the P&ID image with ease. To confirm the benefits of NoOP, we conducted the NoOP evaluation with a real P&ID document. As a result, we can confirm that all objects are successfully detected over a short time (only 7.11 s on average). This paper makes the following contributions:

We briefly analyze the drawbacks that come from marine science and engineering environments using DL technologies.
We propose a novel scheme, called NoOP, which enables DL technologies to recognize objects (i.e., lines, tags, and symbols) in P&ID documents without learning models in a large dataset.
We implement NoOP and evaluate its performance and accuracy with the real-world P&ID document. In addition, to clearly understand the effectiveness of NoOP, we show how to recognize lines, tags, and symbols by dividing the description into several steps.

The rest of this paper is organized as follows. Section 2 briefly introduces P&ID and DL technologies regarding this work. Section 4 describes the design of NoOP in detail. Section 5 presents the implementation details and our evaluation results. Finally, Section 3 provides relevant studies and Section 6 concludes this work.

2. Background

Now, we will describe the importance of P&ID documents and their unique characteristics, including human efforts in marine science and engineering environments. Then, we introduce some DL technologies to easily understand our work.

2.1. What Is P&ID Documents?

In marine science and engineering environments, piping and instrumentation diagram (P&ID) is one of the most important documents; it includes necessary objects (e.g., lines, tags, and symbols) about ship designs [20]. Unfortunately, since many manufacturers in the shipbuilding industry prefer to protect their own proprietary, most P&ID documents are deployed in “.pdf” form. It requires huge human efforts to transform data described in P&ID documents into available information; engineers in third-party manufacturers have to copy the necessary objects from “.pdf” files to their own documents or databases in a manual way. In addition, such efforts significantly increase the possibility of human error.

Figure 1 shows an example of real-world P&ID documents. In this figure, a symbol is represented by a shape composed of a single circle and two trapezoids; a total of 8 symbols are here. A line indicates a pipeline connecting each side of the symbols, and a line with a vertical line at the end of the line means a line disconnected from the opposite line; a total of 4 lines are in this example. Finally, the text over the lines and symbols are denoted as tags; this example has a total of 12 tags. In summary, engineers in third-party manufacturers have to record 8 symbols, 4 lines, and 12 tags in their own forms (i.e., files or databases), respectively, after watching this example P&ID document.

2.2. Recognition Schemes Based on Deep Learning

Nowadays, a new trend is to replace traditional human efforts with an automated process using DL technologies (i.e., digital transformation) [11,12,18,21,22,23,24,25,26]. In this section, we briefly introduce two major DL technologies for replacing human efforts with automated text and line recognition schemes through digital image processing [18,21,22,23,24,25].

First, some researchers have focused on using DL techniques to automatically extract text from the image files [21,22,23,24]. For example, CRAFT [23] not only utilized fully convolution networks (FCN) [21] but also the VGG16 model [26]. It ranks and manages scoreboards by computing the possibility of a pixel placed in the center of a single letter (i.e., region score map) and by analyzing the density and distance of pixels (i.e., affinity score map) in an image file. Then, it extracts all available words by merging letters based on the scoreboard. Meanwhile, TesseractOCR [24] adopted the long short-term memory (LSTM) model [16] on the optical character recognition (OCR) engine [17]. To discover each word in an image file, Tesseract OCR scans to find regions, including text, and then predicts whether each region has a word or not using pattern matching, which estimates the likelihood of matching text belonging to a region with a word in the language model. After finishing the discovery process, it finally provides available words as well as detailed information about the placement of each word: x-axis, y-axis, height, and width in the image file.

Secondly, several previous studies proposed new digital image processing mechanisms, where meaningful lines are intelligently detected via DL techniques [18,25,27]. As far as we know, the Hough transform algorithm is the most well-known approach among image-processing mechanisms and it efficiently enables service for recognizing straight lines from the image-formatted files [18]. Interestingly, it is very similar to the above text recognition approaches in that the transform algorithm diagnoses straight lines according to areal density (i.e., the number of pixels). Figure 2 shows how the Hough transform algorithm recognizes lines. As shown in Figure 2, it first traverses all boxes in the image-formatted file, one by one, and records the location if it finds a box where pixels colored in black exceed the pre-defined threshold (parameter); there are three dots. The transform algorithm draws several virtual lines that have a high likelihood of becoming a straight line. After drawing the virtual lines, it determines where a real straight line is placed by comparing how many boxes colored in black exist across each virtual line. Finally, the Hough transform algorithm repeats the drawing and determination process until it finds all straight lines in the image-formatted file (see Figure 2e).

3. Related Work

Most shipbuilding companies apply digital data management systems in the overall task [28]. In addition, various studies on digital transformation have been conducted to increase work efficiency [29,30,31,32]. These are required to digitize unstructured analog data, such as image format drawings [25,33,34,35,36].

The approaches studied in the 1980s and 2010s recognized symbols by applying knowledge-based pattern-matching techniques using image processing techniques, such as morphological operations [20,37,38,39]. De et al. [38] developed a method to analyze the geometric features of electrical symbols on imaged electrical circuit diagrams. They generated knowledge based on the shape and the pixel positions for each electrical symbol. After that, they recognized symbol objects that matched the knowledge when scanning the imaged circuit diagrams. Through this approach, they demonstrated the recognition of 18 symbols in real-world circuit diagram images. Tan et al. [20] proposed a symbol recognition system for P&ID using the pattern-matching method. They employed a morphology operation to extract the outline of each symbol and create a symbol library. When scanning the input P&ID image, the candidate groups considered as symbols were selected. Candidates were then compared against a symbol library using the K-nearest neighbor [40] algorithm to extract the symbol with the highest similarity. In the experiments, they constructed a symbol library by using 47 P&ID images for 7 classes of symbol recognition, showing 93% accuracy. However, these techniques have difficulty recognizing when the symbol standard is diverse. Moreover, recognition performance is poor in complex shapes; it is not suitable for modern drawings.

For decades, many studies have begun to leverage deep learning (DL) technology for symbol recognition in image-format drawings [25,33,34,35,36]. Generally, they train the DL model to recognize symbols by using a symbol image dataset. This methodology has the advantage of enabling automatic pattern analysis on symbol shapes—a task traditionally performed manually—through DL model training. It also offers high-recognition performance without the need for a manual analysis process Moon et al. [35] proposed the pipeline recognition method using RetinaNet [41], a DL-based object detection model in P&ID images. RetinaNet was trained to recognize 12 types of pipelines, and the Hough transform was used to recognize pipelines composed of continuous lines. They constructed a training dataset by extracting pipeline sign images from 82 P&ID sheets and augmented the dataset to improve recognition performance. As a result of the evaluation, an average accuracy of 96.14% was shown in 9 test P&IDs. Yu et al. [25] presented a recognition system that can recognize tags, symbols, and pipelines in P&ID using a DL model and pixel-processing method. They defined nine classes of symbols and used AlexNet [42], which is known as an image classification model for symbol recognition. Tag recognition is performed using (the connectionist text proposal network) CTPN [43], which detects the horizontal text from P&ID images. Pipeline recognition is performed by searching continuous black pixels. They also construct a training dataset for each model. Evaluations were performed using two P&IDs and showed an average of 91.6% for the symbol, 83.1% for the tag, and 90.6% for pipeline recognition accuracy, respectively. Rahule et al. [33] used a revised VGG19 [26] model for symbol recognition, a CTPN model for the tags, and Hough transform for the pipelines. In particular, they proposed the method of extracting connectivity information between pipelines and symbols by analyzing recognized data. This is done by calculating the minimum Euclidean distance from the input image of the recognized objects. This associates the symbol with the closest pipeline. With this method, they achieved an average accuracy of 94.7% for 10-class symbol recognition, 90% for tags, and 65.2% for pipelines in 4 P&IDs. In addition, 96.8% of inlet association accuracy was shown.

These DL-based studies have significantly increased the performance of converting imaged drawings to digitized drawings. However, the existing methods require a training process for DL models. This process involves manually constructing a training dataset. Dataset construction entails extracting all target symbol images from real-world P&IDs and adding label annotations for classification or recognition. Moreover, even for identical symbols, the training data must be configured separately according to arrangement directions [35]. Therefore, the manual construction of a large dataset is time-consuming and labor-intensive. DL models usually require large training datasets to achieve high accuracy [44]. This trade-off can be a disadvantage in DL-based recognition technologies.

4. Design and Implementation

Now, we introduce a new mechanism, called the no-training object picker (NoOP), which efficiently recognizes all objects (e.g., lines, tags, and symbols) in image-formatted P&ID documents. We designed NoOP with two key design principles: (1) recognizing all objects in P&ID documents without the training datasets, (2) providing transformed data for digital translation.

Figure 3 shows the overall procedures of NoOP. As shown in Figure 3, NoOP first performs the pre-processing step that removes unnecessary border lines and noises in the image-formatted file, which can lead to wrong recognition (➀). NoOP discovers all objects through three detection steps (i.e., tag, symbol, and line); we describe how to detect each object throughout the rest of this section (➁–➆). After that, NoOP provides information for digital transformation (➇).

4.1. Tag Detection

In marine science and engineering environments, the key challenge is: how can we detect objects with no sharing dataset for training the P&ID documents? To address this challenge, we delve deeply into the structure of P&ID documents and discern that most tags are located over lines or symbols and that most symbols are of a similar size Based on this insight, we design a new symbol detection mechanism based on the location information of each tag. NoOP first finds all tags in the target file (i.e., image-formatted file) and logs the location information of each, one by one (➁, in Figure 3). To recognize tags in the form of text, we implement this step based on the Tesseract OCR engine. The engine returns the location information once a tag is recognized, as mentioned before; the information is composed of the text, x-axis, y-axis, height, and width. In NoOP, we classify the information into two tables, based on the meaning of the recognized text: symbol tag location and line tag location. The symbol location table is conveyed to the symbol detection phase as a hint, enabling the recognition of all symbols in the target file without any training dataset.

4.2. Symbol Detection

Now, NoOP snapshots the target file to recognize symbols one by one based on the symbol location table. Unfortunately, the location of the text belonging to the corresponding symbols can be placed anywhere around the symbol (i.e., up, down, left, and right). To solve this issue, NoOP sequentially snapshots the possible location of the symbol (i.e., bounding box) based on the location of the corresponding text (➂ in Figure 3); the height and width of a bounding box when performing a snapshot procedure are predefined by the user. Then, NoOP finally selects the one that includes the highest areal density of black pixels and confirms the x-axis and y-axis with the corresponding tags.

Figure 4 shows an example of a snapshot process when it successfully detects a symbol. As shown in Figure 4, based on the location of the corresponding text, the size and start locations of a bounding box are automatically calculated, as indicated by the red dotted line. Of course, it is possible that parts of the whole symbol are only recognized due to the size and start of the calculated bounding box. To enhance the symbol recognition accuracy, NoOP compares four different sides of snapshots (i.e., up, down, left, and right) and finally selects one snapshot that includes the highest areal density of black pixels. We believe that using the black pixels is a reasonable choice because it has been validated in the Hough transform algorithm. To connect the semantics among recognized objects, NoOP stores a snapshot of an image along with the location of each symbol in the table featuring symbol objects.

In summary, NoOP can perfectly recognize all symbols in the target file with no training datasets; this implies no-learning object detection, thereby saving efforts to collect unpublished datasets.

4.3. Line Detection

Unlike symbol detection, it is challenging to design line detection because lines cannot predict the height and width in advance. Fortunately, the aforementioned Hough transform algorithm helps to solve this issue. However, it is not helpful in recognizing various line patterns yet (i.e., connected, disconnected, and composed lines). A P&ID document has two types of across lines; one is a connected line and another is a disconnected line. In fact, it is more difficult to distinguish the above two lines without any datasets. This step is more complex compared to symbol detection (➃ in Figure 3). To distinguish between two different lines, NoOP focuses on the end of a recognized line and tag information. If a line satisfies all of the following three conditions, NoOP determines the line is a disconnected line. (1) At the end of a line, there exists a short vertical line. (2) On the opposite side, there exists a symmetrical line with the same y-axis. (3) There are two independent tags around the two lines. In the third condition, an independent tag means that one of the tag locations is matched to the edge location of the recognized line; we will describe this in more detail. Meanwhile, a line can be composed of one or more long vertical and horizontal lines. In this case, NoOP checks whether adjacent lines can be merged using their location information (i.e., x-axis and y-axis). If the recognized vertical and horizontal lines can be touched at the end of them, NoOP merges them into a line (➄ in Figure 3). Note that line detection does not consider the location of symbols; therefore, it recognizes one line even if a symbol exists between two lines; it bypasses the size of the symbol bounding box. In addition, NoOP temporarily removes the recognized line in the target image by masking it out to avoid duplicate line recognition and repeats the process of line detection until all lines are removed.

Figure 5 shows the recognized lines when using the aforementioned image-formatted P&ID document. In Figure 5, text colored in red means the x-axis and y-axis of each recognized line. As shown in Figure 5, NoOP recognizes Line-002 and Line-003 as independent lines because they satisfy the above three conditions. On the other hand, Line-004 is recognized as the connected line since it is unavailable for the first and third conditions. Finally, Line-001 is merged with the neighboring lines whose ends touch.

In summary, it gracefully recognizes all types of lines (i.e., connected, disconnected, and composed lines) without training datasets; thus, it can take the same benefits as mentioned before.

4.4. Semantic Generation

For digital transformation, all recognized objects that pass through ➁–➃ in Figure 3 must be represented in natural languages that are used to insert into a form of file or database; in this paper, we call this process semantic generation. In other words, each line in the target file has to link with the corresponding tag information for the digital transformation. To generate semantics, NoOP first builds up key–value pairs based on the location of recognized tags and lines (➅ in Figure 3). If the distance between the location of a line and a tag is lower than the pre-defined threshold, then NoOP assigns the line to the tag; the key is a tag, the value is the corresponding line, and one key–value item is assigned to only one line per tag. After finishing the matching of key–value fairs, NoOP inserts recognized symbols into lines on the key–value pairs (➆ in Figure 3). It is very simple and straightforward to insert symbols over the corresponding lines because NoOP already knows the location information of the whole recognized object. Finally, NoOP generates complete semantic information by scanning the key–value pairs (➇ in Figure 3). The semantic information describes which lines include symbols and tags (i.e., valves). For example, Line 4 includes three valves between 5 and 7. These semantics can be used to make training datasets without human intervention.

5. Evaluation

In this section, we describe the accuracy and performance of NoOP in detail. Unfortunately, since there are no previous efforts that use DL technologies for the image recognition process without training datasets, we do not compare NoOP with other mechanisms. But, we sometimes compare NoOP with the existing mechanisms to clearly confirm the effectiveness of NoOP.

5.1. Experimental Setup and Workload

We conducted all experiments on the machine using an Intel Core i9-12900KF (3.2 GHz, 6 Core) CPU, 32 GB DRAM, and Nvidia Geforce GTX 1650 GPU. We also used Ubuntu 20.04 LTS as the operating system and Python 3.8 with some open-source libraries to implement NoOP. We employed PyTesseract [45], including Tesseract OCR and OpenCV [19], including the Hough transform algorithm for implementing the recognition of text and image processing (i.e., lines and symbols), respectively. Since NoOP never requires the training datasets to recognize objects in the P&ID image and time for training, we did not compare NoOP with the existing mechanisms where datasets were necessary. For evaluation, we used two real-world P&ID documents that were in the form of pdf format as the input of NoOP. Figure 6 and Figure 7 show the default P&ID images; A was encoded into a resolution of 6622 × 4677, with 6 lines and 2 types of symbols and B was encoded into a resolution of 3325 × 2475, with 6 lines and 8 types of symbols. In addition, we extended our evaluation using three noise mechanisms that made the documents disturb the recognition of them: Gaussian noise, salt–pepper noise, and digital watermarking [46,47,48]. In this section, we dive deeply into how NoOP recognizes each object (i.e., tags, lines, and symbols), respectively.

5.2. The Results of Tag and Symbol Detection

As mentioned before, NoOP first recognizes tags in the image-formatted file and records them into either the symbol tag location or line tag location. Then, NoOP sequentially scans the target image file and captures four-side snapshots (i.e., up, down, left, and right) when the scanner arrives at the location pointed to by each tag. Finally, NoOP retains only one snapshot that includes the highest number of pixels among the four snapshots.

Figure 8 and Figure 9 show the results of tag and symbol detection with the P&ID file. As shown in Figure 8 and Figure 9, NoOP completely recognizes all symbols in the image file. That means that NoOP correctly detects and saves all tags because the location of the snapshot depends on the location of each tag. In this experiment, we set the width and height of each bounding box to 1.5 px and 5 px, respectively. As we expected, in Figure 8, NoOP successfully recognized the symbol located on the left side of the tag (i.e., Valve-004, Valve-008, and Valve-011 in Figure 8). In addition, NoOP recognized all different types of symbols in Figure 9. Actually, it is a very interesting result because the existing mechanisms using DL technologies based on the training dataset require two different-side datasets (i.e., vertical and horizontal symbol datasets) [35].

5.3. The Results of Line Detection

Now, we confirm the line detection of NoOP. In this experiment, we compare NoOP with the original Hough transform algorithm to confirm the effectiveness of NoOP.

Figure 10 shows the results of line detection with the Hough transform algorithm. Figure 11 and Figure 12 show the results of line detection using NoOP, respectively. Interestingly, Figure 10 shows that the fact that the original Hough transform algorithm that is widely used in image processing missed a total of three lines (i.e., Line-001, Line-004, and Line-009). In addition, the Hough transform algorithm wrongly detected that Line-001 was a non-contiguous line. On the other hand, NoOP correctly recognized all lines, as shown in both Figure 11 and Figure 12. These are the expected results because NoOP optimizes the process of line detection to recognize all lines and distinguish the line patterns (i.e., connected, disconnected, and composed lines). (1) NoOP utilizes tag information to determine whether a structure is a line or not, thus it never divides one line into two. (2) NoOP temporarily removes the recognized line in the target image by masking it out to avoid duplicate line recognition and repeats this process until all lines are masked.

Figure 13 and Figure 14 show the conditions after recognizing all lines with NoOP. As shown in Figure 13 and Figure 14, all lines are not shown because the recognized lines are masked. In fact, this masking process may consume more time compared with the original Hough transform algorithm (because of repetition). However, we believe the time is negligible in that NoOP never requires the training process that consumes a large amount of time before running inference.

5.4. The Results of the Semantic Generation

As shown in Figure 11, the recognized straight lines never obtain any meaning without merging the neighboring lines and symbols based on their location. To extract valuable meaning from P&ID documents, the information on semantics has to be transferred to users with ease. In other words, users can simply and straightforwardly know which symbols are connected to the line. Therefore, NoOP generates semantics based on the location information of whole recognized objects.

Figure 15 and Figure 16 show the results with rich semantic information that comes from NoOP. As shown in Figure 15, Line-001, composed of two horizontal lines and one vertical line, now has semantic information, where a total of five valves are connected on the same line. Meanwhile, Line-002 and Line-003 are disconnected lines; thus, they are independent semantic information. In addition, NoOP successfully recognizes all lines, even though there are complex valves in Figure 16. The graphical results are automatically represented in natural language that is used to insert a form of the file or database. In summary, NoOP perfectly recognizes all objects (i.e., tags, lines, and symbols) and gracefully provides the information of the semantics (i.e., graphical model and natural languages) described in the image-formatted P&ID file, without any dataset training of inference.

5.5. The Robustness of `NoOP`

A good recognition scheme has to exhibit high robustness against various document statuses. To measure the robustness of NoOP, we made more P&ID documents by synthesizing the original documents based on three noise mechanisms (e.g., Gaussian noise, salt–pepper noise, and digital watermarking), respectively. We performed the same evaluation with three noisy documents as the input of NoOP.

First, we synthesize documents by adding Gaussian noise to the original one. Gaussian noise is widely used in digital images. In Gaussian, the P&ID image is crashed along with a color distortion based on Gaussian distribution. The image can include different colors and brightness compared with the original P&ID document. This disturbance is supposed to affect the recognition of the image. Figure 17 and Figure 18 clearly show that NoOP successfully recognizes all lines and symbols even in the Gaussian noise. In addition, we add salt–pepper noise to the original P&ID documents to confirm the noise impacts of the white and black color distortion.

Figure 19 and Figure 20 show the robustness of NoOP, even in the crashed images with salt–pepper noise. Of course, this evaluation may have a negative impact on NoOP because NoOP employs white and black colors as important parameters to detect symbols and lines. But, as shown in Figure 19 and Figure 20, salt–pepper noise never impacts the robustness of NoOP because NoOP utilizes various parameters other than colors.

Finally, we performed additional experiments for cases where image documents did not contain noise but did include watermarks on images. As we expected, Figure 21 and Figure 22 clearly show all kinds of objects by successfully recognizing them. We believe this evaluation is very important and meaningful because all industrial companies utilize this kind of watermarking and deploy their documents with the watermarking included.

In summary, NoOP shows good results, even in environments with two types of noise mechanisms and watermarking. This is very meaningful, as NoOP recognizes all objects without any training processes with datasets.

5.6. The Elapsed Time of `NoOP`

The most powerful benefit of NoOP is to save time because it never considers the training time, even if the size or resolution of P&ID documents increases.

Figure 23 shows the elapsed time after completing all recognition processes by NoOP; we repeated the same experiment 10 times to confirm the reliability. As a result, NoOP only consumes 7.11 s on average to finish all jobs; it is incredibly efficient. For comparison, we conducted an additional experiment using the MNIST dataset. The MNIST dataset is widely used to recognize handwritten Roman numerals ranging from 0 to 9. To run the DL model with the MNIST dataset, we implemented one of the convolutional neural network (CNN) models and measured the elapsed time of the training time that was necessary. Figure 24 shows our evaluation results, where the same experiment is repeatedly performed (like the above). As shown in Figure 24, this model should take 153.64 s only for the training process.

In summary, NoOP is able to recognize all objects in 7.11 s without wasting time on the training dataset. We believe the principles and methodologies of NoOP are usable and applicable in the other DL technologies in marine science and engineering environments.

6. Conclusions

In recent years, there has been a significant increase in efforts to apply DL technologies in both academic and industrial realms within marine science and engineering environments. Unfortunately, to protect their proprietary interests, people would not share necessary documents (i.e., datasets) for training in DL technologies with third-party manufacturers. In this paper, we propose a new mechanism, called NoOP, which does not require any datasets, but perfectly recognizes all objects (i.e., lines, tags, and symbols) in the image-formatted P&ID file. In addition, our evaluation shows that NoOP achieves high performance compared to mechanisms that require datasets, as it eliminates the time spent on training datasets. Finally, we believe that NoOP facilitates the sharing of P&ID documents in marine science and engineering environments for training purposes because it helps us build our own training datasets with ease. We will leave some tuning parameters for exploration in future work. In future work, we will study more P&ID documents that incorporate more complex objects used by industrial companies.

Author Contributions

Conceptualization, J.C. and D.K.; methodology, J.C. and D.K.; software, J.C.; validation, J.C., D.A. and D.K.; data curation, J.C. and D.K.; writing—original draft preparation, D.A. and D.K.; writing—review and editing, J.C., D.A. and D.K.; visualization, J.C. and D.K.; supervision, D.A. and D.K.; funding acquisition, D.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2021R1I1A3047006) and by the Gachon University research fund of 2023 (GCU-202307750001).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DL	deep learning
CNN	convolution neural network
DNN	deep neural network
YOLO	you only look once
CAD	computer-aided design
pdf	portable document format
P&ID	piping and instrumentation diagram
DX	digital transformation
OCR	optical character recognition
FCN	fully convolution networks
ICDAR	International Conference of Document Analysis and Recognition
VGG	visual geometry group
EAST	efficient and accurate scene text detector
CRAFT	character-region awareness for text detection
CTPN	connectionist text proposal symbol network
LSTM	long short-term memory
$S n a p s h o t_{w a y}$	elements of captured symbol bounding box candidate
$T_{x, y}$	upper left coordinate of the text bounding box
$S_{x, y}$	upper left coordinate of the symbol bounding box
$W_{T}$	width of the text bounding box
$W_{S}$	width of the symbol bounding box
$H_{T}$	height of the text bounding box
$H_{S}$	height of the symbol bounding box

References

Martin, A.; Paul, B.; Jianmin, C.; Zhifeng, C.; Andy, D.; Jeffrey, D.; Matthieu, D.; Sanja, Y.G.; Geoffrey, I.; Michael, I.; et al. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of the Symposium on Operating Systems Design and Implementation, Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
Wang, T.; Toh, W.Q.; Zhang, H.; Sui, X.; Li, S.; Liu, Y.; Jing, W. RoboCoDraw: Robotic Avatar Drawing with GAN-Based Style Transfer and Time-Efficient Path Optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 10402–10409. [Google Scholar]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. In Proceedings of the Advances in Neural Information Processing Systems 33, Proceedings of the Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, Virtual, 6–12 December 2020; Volume 33, pp. 1877–1901.
Cui, L.; Biswal, S.; Glass, L.M.; Lever, G.; Sun, J.; Xiao, C. CONAN: Complementary Pattern Augmentation for Rare Disease Detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–12 February 2020; Volume 34, pp. 614–621. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 28. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference On Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 11–13 June 2019; Volume 97, pp. 6105–6114. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Ketkar, N. Introduction to Keras. In Deep Learning with Python: A Hands-On Introduction; Apress: Berkely, CA, USA, 2017; pp. 97–111. Available online: https://link.springer.com/book/10.1007/978-1-4842-2766-4 (accessed on 24 April 2023).
Park, H.; Ham, S.H.; Kim, T.; An, D. Object Recognition and Tracking in Moving Videos for Maritime Autonomous Surface Ships. J. Mar. Sci. Eng. 2022, 10, 841. [Google Scholar] [CrossRef]
Li, L.; Jiang, L.; Zhang, J.; Wang, S.; Chen, F. A Complete YOLO-Based Ship Detection Method for Thermal Infrared Remote Sensing Images under Complex Backgrounds. Remote Sens. 2022, 14, 1534. [Google Scholar] [CrossRef]
Kim, M.; Choi, W.; Kim, B.C.; Kim, H.; Seol, J.H.; Woo, J.; Ko, K.H. A Vision-based System for Monitoring Block Assembly in Shipbuilding. Comput.-Aided Des. 2015, 59, 98–108. [Google Scholar] [CrossRef]
Choi, Y.; Park, J.H.; Jang, B. A Risk Estimation Approach based on Deep Learning in Shipbuilding Industry. In Proceedings of the IEEE International Conference on Information and Communication Technology Convergence, Jeju, Republic of Korea, 16–18 October 2019; pp. 1438–1441. [Google Scholar]
Kong, M.C.; Roh, M.I.; Kim, K.S.; Lee, J.; Kim, J.; Lee, G. Object Detection Method for Ship Safety Plans using Deep Learning. Ocean. Eng. 2022, 246, 110587. [Google Scholar] [CrossRef]
Smith, R. Modernization Efforts: Cleaning up the Code and Adding New LSTM Technology. 2016. Available online: https://tesseract-ocr.github.io/docs/das_tutorial2016/6ModernizationEfforts.pdf (accessed on 20 April 2023).
Smith, R. Tesseract-OCR Library. Available online: https://github.com/tesseract-ocr/tesseract (accessed on 20 April 2023).
Illingworth, J.; Kittler, J. A Survey of the Hough Transform. J. Comput. Vision Graph. Image Process. 1988, 44, 87–116. [Google Scholar] [CrossRef]
Alekhin, A. OpenCV Library. Available online: https://opencv.org/ (accessed on 30 March 2023).
Tan, W.C.; Chen, I.M.; Tan, H.K. Automated Identification of Components in Raster Piping and Instrumentation Diagram with Minimal Pre-processing. In Proceedings of the IEEE International Conference on Automation Science and Engineering (CASE), Fort Worth, TX, USA, 21–25 August 2016; pp. 1301–1306. [Google Scholar]
Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Zhou, X.; Yao, C.; Wen, H.; Wang, Y.; Zhou, S.; He, W.; Liang, J. EAST: An Efficient and Accurate Scene Text Detector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 9365–9374. [Google Scholar]
Baek, Y.; Lee, B.; Han, D.; Yun, S.; Lee, H. Character Region Awareness for Text Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–19 June 2019; pp. 9365–9374. [Google Scholar]
Smith, R. An Overview of the Tesseract OCR Engine. In Proceedings of the International Conference on Document Analysis and Recognition, Curitiba, Brazil, 23–26 September 2007; Volume 2, pp. 629–633. [Google Scholar]
Yu, E.S.; Cha, J.M.; Lee, T.; Kim, J.; Mun, D. Features Recognition from Piping and Instrumentation Diagrams in Image Format Using a Deep Learning Network. J. Energies 2019, 12, 4425. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Fujiyoshi, H.; Hirakawa, T.; Yamashita, T. Deep learning-based Image Recognition for Autonomous Driving. J. Int. Assoc. Traffic Saf. Sci. 2019, 43, 244–252. [Google Scholar] [CrossRef]
Sanchez-Gonzalez, P.L.; Díaz-Gutiérrez, D.; Leo, T.J.; Núñez-Rivas, L.R. Toward Digitalization of Maritime Transport? Sensors 2019, 19, 926. [Google Scholar] [CrossRef] [PubMed]
Park, S.; Huh, J.H. Study on PLM and Big Data Collection for the Digital Transformation of the Shipbuilding Industry. J. Mar. Sci. Eng. 2022, 10, 1488. [Google Scholar] [CrossRef]
Pang, T.Y.; Pelaez Restrepo, J.D.; Cheng, C.T.; Yasin, A.; Lim, H.; Miletic, M. Developing a Digital Twin and Digital Thread Framework for an ‘Industry 4.0’ Shipyard. Appl. Sci. 2021, 11, 1097. [Google Scholar] [CrossRef]
Lee, G.A.; Yang, U.; Son, W.; Kim, Y.; Jo, D.; Kim, K.H.; Choi, J.S. Virtual Reality Content-Based Training for Spray Painting Tasks in the Shipbuilding Industry. ETRI J. 2010, 32, 695–703. [Google Scholar] [CrossRef]
Zheng, Q.; Tian, X.; Yu, Z.; Jiang, N.; Elhanashi, A.; Saponara, S.; Yu, R. Application of Wavelet-Packet Transform Driven Deep Learning Method in PM2.5 Concentration Prediction: A Case Study of Qingdao, China. Sustain. Cities Soc. 2023, 92, 1–13. [Google Scholar] [CrossRef]
Rahul, R.; Paliwal, S.; Sharma, M.; Vig, L. Automatic information extraction from piping and instrumentation diagrams. In Proceedings of the International Conference on Pattern Recognition Applications and Methods, Prague, Czech Republic, 19–21 February 2019; pp. 163–172. [Google Scholar]
Kang, S.O.; Lee, E.B.; Baek, H.K. A Digitization and Conversion Tool for Imaged Drawings to Intelligent Piping and Instrumentation Diagrams (P&Id). Energies 2019, 12, 2593. [Google Scholar]
Moon, Y.; Lee, J.; Mun, D.; Lim, S. Deep Learning-Based Method to Recognize Line Objects and Flow Arrows from Image-Format Piping and Instrumentation Diagrams for Digitization. J. Appl. Sci. 2021, 11, 10054. [Google Scholar] [CrossRef]
Kim, H.; Lee, W.; Kim, M.; Moon, Y.; Lee, T.; Cho, M.; Mun, D. Deep-learning-based Recognition of Symbols and Texts at an Industrially Applicable Level from Images of High-density Piping and Instrumentation Diagrams. J. Expert Syst. Appl. 2021, 183, 115337. [Google Scholar] [CrossRef]
Fahn, C.S.; Wang, J.F.; Lee, J.Y. A Topology-based Component Extractor for Understanding Electronic Circuit Diagrams. Comput. Vision Graph. Image Process. 1988, 44, 119–138. [Google Scholar] [CrossRef]
Kato, H.; Inokuchi, S. The Recognition Method for Roughly Hand-drawn Logical Diagrams Based on Hybrid Utilization of Multi-layered Knowledge. In Proceedings of the 10th International Conference on Pattern Recognition, Atlantic City, NJ, USA, 16–21 June 1990; Volume 1, pp. 578–582. [Google Scholar]
De, P.; Mandal, S.; Bhowmick, P. Recognition of electrical symbols in document images using morphology and geometric analysis. In Proceedings of the 2011 International Conference on Image Information Processing, Shimla, India, 3–5 November 2011; pp. 1–6. [Google Scholar]
Guo, G.; Wang, H.; Bell, D.; Bi, Y.; Greer, K. KNN Model-Based Approach in Classification. In Proceedings of the OTM Confederated International Conferences CoopIS, DOA, and ODBASE 2003 Catania, Sicily, Italy, 3–7 November 2003; pp. 986–996. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. J. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Tian, Z.; Huang, W.; He, T.; He, P.; Qiao, Y. Detecting Text in Natural Image with Connectionist Text Proposal Network. In Proceedings of the 14th European Conference on Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; pp. 56–72. [Google Scholar]
Sun, C.; Shrivastava, A.; Singh, S.; Gupta, A. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
Hoffstaetter, S. PyTesseract. Available online: https://github.com/madmaze/pytesseract (accessed on 30 March 2023).
Boncelet, C. Image noise models. In The Essential Guide to Image Processing; Elsevier: Amsterdam, The Netherlands, 2009; pp. 143–167. [Google Scholar]
Toh, K.K.V.; Ibrahim, H.; Mahyuddin, M.N. Salt-and-pepper noise detection and reduction using fuzzy switching median filter. IEEE Trans. Consum. Electron. 2008, 54, 1956–1961. [Google Scholar] [CrossRef]
Wikipedia. Available online: http://en.wikipedia.org/w/index.php?title=Watermark&oldid=1161923484 (accessed on 16 September 2023).

Figure 1. The sample of an image-formatted P&ID document.

Figure 2. The Hough transform process for line detection in image. (a) Finds the location of pixels, colored in black. (b) Generates linear equations on each pixel. (c) Converts linear equation parameters into the Hough space (

ρ

,

θ

). (d) Accumulates all possible parameter sets of each pixel and count of the same value. (e) Draws the line on the original image.

Figure 2. The Hough transform process for line detection in image. (a) Finds the location of pixels, colored in black. (b) Generates linear equations on each pixel. (c) Converts linear equation parameters into the Hough space (

ρ

,

θ

). (d) Accumulates all possible parameter sets of each pixel and count of the same value. (e) Draws the line on the original image.

Figure 3. The process of the proposed scheme.

Figure 4. The example of a snapshot process to recognize a symbol.

Figure 5. The sample of the line detection.

Figure 6. A real-world P&ID document (A).

Figure 7. A real-world P&ID document (B).

Figure 8. The results of tag and symbol detection (A).

Figure 9. The results of tag and symbol detection (B).

Figure 10. The results of line detection over the Hough transform algorithm.

Figure 11. The results of line detection over NoOP (A).

Figure 12. The results of line detection over NoOP (B).

Figure 13. The results of masked lines on NoOP (A).

Figure 14. The results of masked lines on NoOP (B).

Figure 15. The results of the highlighted semantics with different colors (A).

Figure 16. The result of the highlighted semantics with different colors (B).

Figure 17. The results of the highlighted semantics with different colors with Gaussian noise (A).

Figure 18. The results of the highlighted semantics with different colors with Gaussian noise (B).

Figure 19. The results of the highlighted semantics with different colors with salt–pepper noise (A).

Figure 20. The results of the highlighted semantics with different colors with salt–pepper noise (B).

Figure 21. The results of the highlighted semantics with different colors with watermarking (A).

Figure 22. The results of the highlighted semantics with different colors with watermarking (B).

Figure 23. The elapsed time of NoOP.

Figure 24. The elapsed time of training the MNIST dataset.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Choi, J.; An, D.; Kang, D. Object Recognition Scheme for Digital Transformation in Marine Science and Engineering. J. Mar. Sci. Eng. 2023, 11, 1914. https://doi.org/10.3390/jmse11101914

AMA Style

Choi J, An D, Kang D. Object Recognition Scheme for Digital Transformation in Marine Science and Engineering. Journal of Marine Science and Engineering. 2023; 11(10):1914. https://doi.org/10.3390/jmse11101914

Chicago/Turabian Style

Choi, Jinseo, Donghyeok An, and Donghyun Kang. 2023. "Object Recognition Scheme for Digital Transformation in Marine Science and Engineering" Journal of Marine Science and Engineering 11, no. 10: 1914. https://doi.org/10.3390/jmse11101914

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Object Recognition Scheme for Digital Transformation in Marine Science and Engineering

Abstract

1. Introduction