Research on Insect Pest Identification in Rice Canopy Based on GA-Mask R-CNN

Liu, Sitao; Fu, Shenghui; Hu, Anrui; Ma, Pan; Hu, Xianliang; Tian, Xinyu; Zhang, Hongjian; Liu, Shuangxi

doi:10.3390/agronomy13082155

Open AccessArticle

Research on Insect Pest Identification in Rice Canopy Based on GA-Mask R-CNN

¹

College of Mechanical and Electronic Engineering, Shandong Agricultural University, Tai’an 271018, China

²

Shandong Province Agricultural Equipment Intellectualization Engineering Laboratory, Tai’an 271018, China

³

Shandong Provincial Key Laboratory of Horticultural Machinery and Equipment, Tai’an 271018, China

⁴

Jinan Academy of Agricultural Sciences, Jinan 250300, China

⁵

Shandong Xiangchen Technology Group Co., Ltd., Jinan 251400, China

^*

Authors to whom correspondence should be addressed.

Agronomy 2023, 13(8), 2155; https://doi.org/10.3390/agronomy13082155

Submission received: 12 July 2023 / Revised: 14 August 2023 / Accepted: 15 August 2023 / Published: 17 August 2023

(This article belongs to the Special Issue In-Field Detection and Monitoring Technology in Precision Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

Aiming at difficult image acquisition and low recognition accuracy of two rice canopy pests, rice stem borer and rice leaf roller, we constructed a GA-Mask R-CNN (Generative Adversarial Based Mask Region Convolutional Neural Network) intelligent recognition model for rice stem borer and rice leaf roller, and we combined it with field monitoring equipment for them. Firstly, based on the biological habits of rice canopy pests, a variety of rice pest collection methods were used to obtain the images of rice stem borer and rice leaf roller pests. Based on different segmentation algorithms, the rice pest images were segmented to extract single pest samples. Secondly, the bug generator based on a generative adversarial network strategy improves the sensitivity of the classification network to the bug information, generates the pest information images in the real environment, and obtains the sample dataset for deep learning through multi-way augmentation. Then, through adding channel attention ECA module in Mask R-CNN and improving the connection of residual blocks in the backbone network ResNet101, the recognition accuracy of the model is improved. Finally, the GA-Mask R-CNN model was tested on a multi-source dataset with an average precision (AP) of 92.71%, recall (R) of 89.28% and a balanced score F1 of 90.96%. The average precision, recall, and balanced score F1 are improved by 7.07, 7.65, and 8.83%, respectively, compared to the original Mask R-CNN. The results show that the GA-Mask R-CNN model performance indexes are all better than the Mask R-CNN, the Faster R-CNN, the SSD, the YOLOv5, and other network models, which can provide technical support for remote intelligent monitoring of rice pests.

Keywords:

rice stem borer; rice leaf roller; generate adversarial network; multistage residuals; ECA attention module

1. Introduction

Rice is one of the three major food crops in the world. Rice pests are diverse, widely distributed, and harmful, and they pose an important constraint on rice yields. Accurate identification of pests is an important part of pest monitoring and control. Currently, rice pest forecasting methods mostly rely on manual recording of adult pest species, numbers, the occurrence time, and other information in the field in order to judge the pattern of pest occurrence and the degree of danger. This method of pest survey suffers from being labour intensive and having low efficiency, a small coverage area, and poor objectivity. Rice canopy pests are mainly two types of pests, rice stem borer and rice leaf roller, and canopy outbreaks are prone to cause yellowing, curling, and rupture of rice leaves, affecting photosynthesis and reducing rice yield and quality.

The main research idea of the traditional image model for rice canopy pest identification is to manually select and design the color, texture, and shape features of the image through image preprocessing, image segmentation, and feature extraction techniques, and then, finally, to classify the feature vectors by supporting vector machines, random forests, artificial neural networks, etc., in order to identify the types of pest symptoms [1,2,3]. Traditional image recognition patterns use professional knowledge and experience to design suitable feature classifiers, and to improve the accuracy and efficiency of recognition by adjusting parameters and optimizing algorithms. The shape and color features of crop pests in natural environments are mostly similar to the images of the survival area, which leads to the poor robustness and weak generalization ability of traditional pattern recognition in real environments.

In recent years, deep learning has a wide range of application prospects in the intelligent identification and forecasting of crop pests and diseases, mainly the two-stage target detection models of the R-CNN [4,5,6,7] series and the single-stage target detection models of the YOLO [8,9,10,11] and SSD [12,13,14] series. Prakruti et al. [15] used the YOLOv3 model to identify and localize diseases on tea leaves, and trained tea disease images with different resolutions, qualities, brightness, and focus, using a rich dataset of disease images with an average accuracy mean of 86%. Lin et al. [16] proposed a YOLO-based automatic detection of thistle counts on the dorsal surface of lotus leaves by enhancing the fuzzy region with neural networks to improve the quality of lotus leaf images, and the accuracy was increased by 9.76% points using the YOLO target detection algorithm. Khalid et al. [17] acquired a total of 9875 images of thistle caterpillars, red beetles, and citrus aphids under different lighting conditions, constituting a rich pest dataset, which was trained by the YOLOv8 network model, and the experimental results showed that the average accuracy mean was 84.7%. Arshaghi et al. [18] used a convolutional neural network approach to apply image processing and artificial intelligence to recognize and classify potato diseases by training 5000 potato images in five states, such as healthy, black spot, through scar, black leg, and pink rot. Lippi et al. [19], driven by the need for precision agriculture in hazelnut orchards, proposed a pest management system for hazelnut orchards that uses multiple data enhancement methods to augment pest images in order to improve pest image quality and diversity, with an average accuracy of 86.7% under the YOLO target detection algorithm. Thenmozhi et al. [20] proposed the Mask R-CNN-based Fall Armyworm pest detection system for farmland to improve the average recognition accuracy of the model by improving the quality of pest images by preprocessing the pest dataset. Kaur et al. [21] used Mask R-CNN model tomato leaf disease for segmentation, and ResNet50 had an average precision of 73.0% in an instance segmentation test. Barmpoutis et al. [22] used high resolution RGB aerial imagery combined with Faster R-CNN and Mask R-CNN network models to locate and monitor pests and invasive pests in forests, and the use of high resolution imagery helped to improve the recognition accuracy of the models. Lin et al. [23] chunked a single insect image to obtain a mixed set of sub-image blocks with background and feature information, and they used the sub-image blocks as dictionary atoms to construct an overcomplete dictionary; this was inputted into the SSD algorithm as a training set for training, and it realized the recognition and classification of incomplete images of rice flycatcher by preprocessing the training set. Gonçalves et al. [24] used an SSD model to identify and classify 8966 grape pests from 168 grape pest images, and the model was deployed on a vineyard pest monitoring device for mobile devices to meet the needs of real production.

Many researchers and scholars have conducted preliminary recognition and classification studies of pests and diseases by means of machine vision and deep learning, and they have achieved research results that show the powerful ability of deep convolutional neural networks. However, the target detection model based on the deep convolutional neural network will lead to either a decrease in the detection performance of the model when there is a lack of training data or the quality of the training sample images not being high, which will result in a neural network recognition model that is not ideal for pest identification and classification [25]. Therefore, in this study, multiple devices are utilized to acquire rice canopy pest images, and multiple methods are used to enhance data on rice pest images and improve deep learning recognition algorithms. The aim of this is to improve the accuracy of rice canopy pest recognition and provide support for the intelligent recognition and monitoring of rice pests.

2. Materials and Methods

2.1. Acquisition of Pest Images

The collection of rice pest samples consisted of three parts. The first part collected rice canopy pests through the pest monitoring equipment developed by the team beforehand at Jining Lvjian Agricultural Science and Technology Co., Ltd. (Jining, China) The second part collected rice pest samples through pest traps at the Shandong Academy of Agricultural Sciences (SANAS) pest monitoring site. The third part photographed the canopy pests of rice plants in the field at the rice planting base of Jinan Wuzhou Urban Agricultural Science and Technology Co. (Jinan, China) We focused on the collection of two pests in the rice canopy—rice stem borer and rice leaf roller. Rice stem borer and rice leaf roller are collectively referred to as rice plant canopy pests; rice stem borer is a pest of rice stems and rice leaf roller is a pest of rice leaves. The pest sample images collected in the experiment encompassed both coastal and inland monsoon areas, and the breadth of coverage ensured the richness of the samples; meanwhile, in order to eliminate the influence of shadows on the captured images, uniform illumination setups were used for pre-processing in the field. To facilitate the GA-Mask R-CNN network model to read the data, the image size of the pest samples of each developmental age collected in multiple ways was uniformly adjusted to 256 pixels × 256 pixels.

2.1.1. Collection of Pest Monitoring Equipment

In this study, the self-developed rice pest monitoring equipment was applied to collect the images of rice stem borer and rice leaf roller, and the rice pest monitoring equipment included solar panels, trapping hormone, image acquisition unit, a collecting box, and other components. The system structure is shown in Figure 1.

Based on the activity habits of rice pests, specific sex hormone information is used to attract rice canopy pests. Meanwhile, through the imaging system, the live pests are photographed from multiple angles and uploaded to the database system, which has been established for comparison and analysis. After identifying the rice pest species, the pests that have been attracted in each time period are tipped down to the live sports insect chamber through the flap plate, and honey water or cotton balls with sugar water are provided in the incubator chamber for the pests to suck on. Image samples of rice stem borer and rice leaf roller pests in different postures were obtained by live-filming the rice pests.

The rice pest monitoring equipment was deployed in the rice planting base of Jining Lvjian Agricultural Science and Technology Co., Ltd., to collect images of rice stem borer and rice leaf roller, and to take pictures of the rice pests in different postures at a distance of 300 mm. The collected images of the pests are shown in Figure 2.

2.1.2. Pest Trap Collection

Traps were arranged at the monitoring point of the experimental station of the Academy of Agricultural Sciences in Jinan City, Shandong Province, to obtain samples of rice pests. The collected pests were summarized to the experimental station, where a closed studio was set up within the laboratory environment, and an online image acquisition environment was constructed based on the Windows 10 environment and the Deli Model 15165 high-frequency camera. The main camera clarity of the high-focus camera was a 25-megapixel fixed-focus lens. The rice pest monitoring sites of Shandong Academy of Agricultural Sciences are shown in Table 1, and the simulated working conditions captured by the system are shown in Figure 3.

2.1.3. Field Photography of Pests in Paddy Fields

In order to enrich the number of rice pest training samples, rice canopy plant pests were collected at the rice planting base of Jinan Wuzhou Urban Agricultural Science and Technology Co. A Canon digital camera was used to collect images of the rice stem borer and rice leaf roller in the natural environment at the rice planting base. The collected samples are shown in Figure 4.

2.2. Single Pest Sample Acquisition

2.2.1. Splitting of Pest Samples for Trapping Equipment

Haar feature is an instant algorithm applied to object detection and recognition, which is essentially a digital image feature that uses neighboring rectangles at a specified position in the detection window by calculating the pixel sum of each rectangle, and the difference taken is used to classify the sub-region of the image [26]. The acquired trap device map and the confined shot map are summarized for single pest sample image acquisition. A certain proportion of positive and negative samples are set for classifier training according to the target pest. The training positive sample is the target pest sample image to be detected, the negative sample is an irrelevant image, and the ratio of the two is set to 1:5. The modified Haar feature algorithm is used to extract features from the samples, which are then input into the AdaBoost classifier for training, and the segmentation model is obtained after 100 iterations to complete the initial segmentation of the pest image. The effect of Haar image segmentation is shown in Figure 5.

2.2.2. Canopy Plant Pest Sample Segmentation

Rice pest interactive image segmentation method is used to obtain rice pest sample images [27,28,29], the pest image impurities and redundant images are eliminated, the extreme point features of multi-tentacles and complex contours of the pest image are extracted, the extreme point features are combined with the HSV spatial color information to generate the CrabCut initialization parameters, the initial markers of foreground and background regions are determined, and the improved CrabCut algorithm is used to iteratively optimize the pest image in order to obtain the pest segmentation results. The target region of the image is evaluated through the center of the mass judgment method in order to evaluate the segmentation results, calculate the center of mass coordinates of the pest region, and determine whether the pest sample is a complete individual. Based on the processing of HSV space and extreme point features, the image of the rice pest sample with a complex background is segmented by image processing to obtain a clear and complete image of the insect body. The effect of CrabCut image segmentation is shown in Figure 6.

2.3. Sample Image Amplification

In this study, conventional data enhancement techniques and deep convolutional generative adversarial network techniques are used to augment the rice single pest images. When the number of sample images in the rice pest dataset is insufficient, the recognition accuracy of the training model will be greatly affected, resulting in the phenomenon of overfitting, which is intuitively manifested in the large difference between the performance of the model in the training set and the test set, with the performance of the training set being better but the performance of the test set being very poor. This will lead to the model not being able to make accurate predictions of the unknown data in the actual deployment.

2.3.1. General Data Enhancement

Sample augmentation of the rice pest dataset using the data augmentation technique, shading adjustment, mirror flip, gaussian noise, pulse noise, random occlusion, random clipping, and rotating different angles are processed for single pests. Enhance the learning of ontological features and, at the same time, expand a large number of high-quality sample images to effectively reduce the structural risk of the training model, and then alleviate the overfitting phenomenon so as to improve the generalization ability of the model. The expanded image is shown in Figure 7.

2.3.2. Deep Convolutional Generation Adversarial Network

Generative Adversarial Network (GAN) is a generative model under the branch of deep learning and consists of a generator and a discriminator [30,31,32,33,34,35]. The goal of the generator is to maximize the discriminator’s misclassification rate of the generated samples and to generate samples similar to real data from random noise. The goal of the discriminator is to minimize the misclassification rate for real and generated samples, distinguishing whether the input samples are real or generated. The generator and the discriminator are optimized against each other until a Nash equilibrium is reached, i.e., the samples generated by the generator are able to fool the discriminator, and the discriminator is unable to accurately determine true or false. Regarding the Deep Convolutional Generative Adversarial Network (DCGAN) for pest image sample generation experiments, its advantage lies in the introduction of a feature extraction powerful convolutional layer to improve the quality of the generated samples, and at the same time, the batch standardization is applied to the convolutional layer so that the gradient information can be effectively propagated to each layer, which strengthens the topology of the network and thus improves the stability of the training model. The principle of generating the adversarial network structure is shown in Figure 8.

Both the generative model and the discriminative model are relatively weak at the beginning of training, and the loss curve will stabilize with the increase in the number of training iterations after oscillating for a short period of time. The function curves of the discriminative loss and the adversarial loss will keep fluctuating in a small interval of the y-axis and will not show an obvious upward or downward trend, and at this time, the overall DCGAN model reaches dynamic equilibrium. The changes in the training loss and the adversarial loss curves of the DCGAN model constructed in the experiment are shown in Figure 9.

The generative model and the training model of the GAN network are two relatively independent networks, sampling real samples in the real dataset while generating some random noise vectors from a certain distribution. Then, the noise vectors are input into the generator to generate false sample data, and, finally, the real samples and an equal amount of false samples are input into the discriminator for discrimination. The resulting images are shown in Figure 10. The pest samples generated by DCGAN enriched the rice canopy pest dataset, and they made up for the rice pest samples that could not be generated by conventional data enhancement techniques.

2.4. Data Set Construction

In this experiment, the polygonal annotation tool in Labelme is used to fit the contour of rice pests for labeling and to generate the Json label file for single pests, which can avoid the influence of the complex background on the pests. The sample labeling process is shown in Figure 11.

In order to enhance the robustness of the rice canopy pest identification model, rice stem borer and rice leaf roller were augmented according to different ratios. The rice pest sample dataset was divided into training set, testing set, and validation set according to 7:2:1. The training set is used to train the recognition model, the validation set is used to optimize the hyperparameters of the recognition model, and the test set is used to evaluate the final effect of the recognition model. The results of the rice pest dataset division are shown in Table 2.

3. GA-Mask R-CNN Model Construction

3.1. Identify the Basic Structure of the Model Network

After the GA-Mask R-CNN network model acquires the augmented rice pest sample image by the deep convolutional generative adversarial network, the backbone feature extraction network ResNet101 - FPN samples and fuses the feature information of the pest image with multi-scale fusion [36,37,38,39] to generate an effective feature map. Subsequently, multiple rectangular prediction frames with different scales are set based on each pixel point of the feature map until a sufficient number of prediction frames are obtained, and these candidate frames are fed into the region suggestion network for Softmax target classification and boundary regression in order to make preliminary judgments on the target information in the frames and to filter out a part of them [40]. The remaining candidate boxes are fed into the region feature pooling layer to match the Roi candidate boxes selected by the RPN network more accurately to the corresponding regions on the original map. Finally, the target classification, the boundary regression, and the mask segmentation of the candidate region feature map are accomplished by two branch networks. The basic framework structure of GA-Mask R-CNN is shown in Figure 12.

The backbone network of the GA-Mask R-CNN model is optimized by connecting the residual block and the ECA attention mechanism module in the backbone network, ResNet101, and the optimized backbone network structure is shown in Figure 13.

3.2. Multi-Level Residual Connection

Improving the connection of residual blocks in the backbone network ResNet101 reduces the training volume of the model and thus saves unnecessary computational overhead by enhancing the propagation of gradient information between convolutional layers. Based on this principle, multilevel residual connectivity is chosen to optimize the residual block connections in the Mask R-CNN backbone network, which transforms the learning problem into learning the residual-to-residual mapping by establishing additional cross-layer connections between the residual modules, which is simpler and easier to learn compared to the original ResNet. In the multilevel residual connection structure, the connections added outside the overall residual blocks are called level 1 residual paths, and the residual network is divided into different groups based on the size of the convolutional kernel filter types. The connections added outside the residual blocks in each group are called level 2 residual paths, and the original residual connections remain unchanged. The additional residual connections enhance the flow of gradient information and make the feature transfer between the high and low layers smoother, thus further suppressing the gradient dispersion and gradient vanishing problems in the large parameter depth models. The structure of multilevel residuals is shown in Figure 14.

Stacking the depth of a convolutional neural network can directly and effectively improve the model performance. However, after reaching a certain degree, the network performance will tend to saturate, i.e., it will no longer improve with the increase in the depth, and will even produce degradation of the network performance while the network is further deepened, resulting in the optimization of the network becoming more and more difficult. To address this problem, the network structure is processed by introducing a shortcut connection mechanism, so that the gradient information can be propagated across the layers and the information between different layers can be transferred to each other, which can alleviate the network degradation caused by the stacking of convolutional layers to a certain extent.

3.3. ECA Attention Module

The attention mechanism is an embedded structure commonly used in deep convolutional networks to filter the key information of features among the massive parameters obtained from the deep model, so as to rationally allocate the limited computational resources. Structurally, the attention mechanism is usually an additional miniature network that enables the network to focus on locally useful information while suppressing other irrelevant information by assigning different weights to different parts of the input image. Image-based attention module mechanisms can be broadly categorized into three types depending on the dimension of application: spatial dimension attention, channel dimension attention, and mixed dimension attention. More complex attention modules can effectively improve model performance but inevitably increase model complexity. Recognition models in the selection of the attention module need to consider the comprehensive performance and complexity of the two factors, and this study selected a lightweight channel attention module ECANet to achieve the focus on the pest characteristics of the information. Its structure is shown in Figure 15.

The ECA attention module takes the input feature map after global average pooling and directly employs a convolutional kernel to adaptively vary the one-dimensional convolution in order to complete the cross-channel information interaction, thereby avoiding the loss of feature information caused by dimensionality reduction. Compared with other attention modules, the ECA module involves only a small amount of computational parameter increase, which effectively improves the recognition accuracy without bringing too much burden to the network.

4. Test Results and Analysis

4.1. Test Environment and Hyperparameter Setting

The base programming language is Python 3.6.0, the display adapter is NVIDIA GeForce RTX 3050 Ti Laptop GPU, the processor is AMD Radeon(TM) Graphics, the memory is 64 GB, the operating system is Windows 11 (64-bit), the program editor is Visual Studio Code (VSCode), the deep learning framework is Tensorflow-1.15.0 + keras-2.1.5, the computing architecture is Cudnn 7.4.15 + CUDA 10.0, and the vision processing language is OpenCV-python 4.5.5.62.

4.2. Test Evaluation Index

In order to evaluate the effectiveness of the rice pest detection model proposed in this study, precision, recall, comprehensive evaluation metrics balance score F1, and multi-task loss function are selected as the existing evaluation metrics of deep convolutional networks for intuitively evaluating the GA-Mask R-CNN model. The evaluation metrics are formulated as follows:

P r e c i s i o n = T P / (T P + F P)

(1)

R e c a l l = T P / (T P + F N)

(2)

F 1 = 2 \times P r e c i s i o n \times R e c a l l / (P r e c i s i o n + R e c a l l)

(3)

L o s s = c l a s s_l o s s + b o x x_l o s s + m a s k_l o s s

(4)

In the formula, TP denotes positive samples detected correctly, FP denotes positive samples detected incorrectly, FN denotes negative samples detected incorrectly, class_loss denotes classification loss, boxx_loss denotes regression loss, and mask_loss denotes mask segmentation loss.

4.3. Analysis of Test Results

4.3.1. Analysis of Test Results in Laboratory Environment

The model was trained by dividing the labeled pest image dataset. The ratio parameter of the training, testing, and validation sets was set to 7:2:1 in the model program. Subsequently, the core parameters of the training model were set, and the detailed data were as follows: the number of iterations (Eopch) was set to 150, the learning rate was set to 1 × 10⁻⁵, the confidence was set to 0.7, and the threshold of the non-maximum suppression (NMS) algorithm was set to 0.3. The number of batches (batch_size) for sample image training was adaptively adjusted according to the bench GPU, which was calculated by the specific formula Images_per_GPU×GPU_Count. Finally, the coco dataset was used for pre-training after the network parameters were set, and the parameters were adjusted for re-training based on the pre-training weights.

Target-detected samples of rice stem borer, rice leaf roller, and other interfering pest samples, such as cotton bollworm and corn borer, were placed on the bug boards. The number of target samples and the irrelevant samples were set in a certain ratio, and the two-class model was used for recognition. Some of the recognition test results are shown in Figure 16.

The laboratory identification test was set up in five rounds, and the specific values of the proportion of the number of pest samples placed and the statistics of the identification results are shown in Table 3.

In the five rounds of laboratory simulations, the number of correct detections of the two types of test models was more similar, but the number of false detections of the improved model was lower. This indicates the improved model has a more accurate identification of non-target pests. The change of loss function, as an important index to measure the predictive ability of the model, can more intuitively show the degree of gap between the predicted data and the actual data. The smaller the value of the loss function, the more robust the model. The loss function is plotted in terms of both the model as a whole and the regional generative network, and the change in the loss function is shown in Figure 17.

After 150 iterations of the model, the loss function curve shows a smooth decrease. The final multitask loss rate converged to 0.02382. The model’s regression loss function converges with a value of 8.1745 × 10⁻⁴ the classification loss function converges with a value of 1.5485 × 10⁻³, the mask segmentation loss function converges with a value of 0.02136, regression loss function converges with a value of 1.0825 × 10⁻³ in the RPN network, and classification loss function converges with a value of 1.1164 × 10⁻⁵ in the RPN network. In addition to this, the test results of the training loss in the validation set show that the regression loss function convergence value is 0.0277, the classification loss function convergence value is 0.02251, the mask segmentation loss function convergence value is 0.2297, the regression loss function convergence value in the RPN network is 0.0324, and the classification loss function convergence value in the RPN network is 7.5916 × 10⁻⁴.

4.3.2. Analysis of Test Results under Monitoring Equipment

In order to verify the recognition effect of the GA-Mask R-CNN network model in the real environment of monitoring equipment, the field test was conducted in the rice planting base of Tannan Farm, Tancheng County, Linyi City, Shandong Province, which is located close to the ecological protection zone of the Yi River and is irrigated by a non-polluting source of water from the Yi River, which creates a natural growing environment suitable for the growth of rice. The field monitoring equipment of the rice planting base in Tannan Farm is shown in Figure 18.

Tests were conducted on the real working conditions of the monitoring equipment, the model is deployed on the cloud platform, and verification tests are conducted by calling the actual working condition diagrams uploaded by the monitoring equipment. In this test, 20 working condition diagrams are randomly selected for recognition, and part of the recognition effect is shown in Figure 19.

The 20 randomly selected maps were manually selected and counted, and a total of 343 rice stem borers and rice leaf rollers, 506 rice borers, cotton bollworms, and American white moths with similar contours, and 169 other pests with large differences in contours were obtained. The number of correct detections in the original model was 228, and the number of false detections was 106. The modified model had 286 correct detections and 75 false detections. Under the influence of the actual environment of the paddy field, the number of false detections of rice stem borer and rice leaf roller pests increased compared with the simulated environment in the laboratory, but the correct detection rate of the model was still maintained at more than 92%, and the accuracy of the effective identification rate of the crippled pests was also more satisfactory.

Precision indicates the proportion of correct predictions from the perspective of prediction results, while recall indicates the proportion of correct predictions among all of the positive samples from the perspective of the samples. The P-R curve, with precision as the vertical coordinate and recall as the horizontal coordinate, is able to summarize the model performance more comprehensively. The P-R curve of the improved model in this experiment is shown in Figure 20.

The average precision (AP) can be derived from the generated P-R curve as follows:

A P = \int_{0}^{1} P (R) d R

(5)

The computation yields an average precision of 92.71%, a recall of 89.28%, and a balanced score F1 of 0.9096 for the GA-Mask R-CNN network model.

4.3.3. Influence of Different Models on Detection Performance

In order to make the performance of the improved model more convincing, the Mask R-CNN recognition model with the original architecture by ean construction was used in the experiment for comparative analysis experiments. The training dataset, parameter configurations, and other environmental factors of the model are set to be consistent with the improved model. The specific result parameters of the model comparison test are shown in Table 4.

Compared with the original model, the average precision and recall of the improved model are increased by 7.07% and 7.65%, respectively, the balance score F1 is improved by 8.83% compared with the original model, and the corresponding loss functions are all reduced to different degrees. Therefore, the improved Mask R-CNN recognition model has a stronger overall performance with more excellent target recognition classification and prediction capabilities, and it is suitable for deployment in intelligent pest detection lamps to accomplish monitoring tasks against the target pests of rice fields—stem borer and rice leaf roller.

In order to verify the advantages of the GA-Mask R-CNN model constructed in this study in recognizing rice canopy pests, the more mature convolutional neural network models Faster R-CNN, SSD, and YOLOv5 were used to experimentally validate the rice canopy pest dataset. The results of the study show that the average precision of the constructed GA-Mask R-CNN network model is improved by 4.5%, 6.8%, and 3.7% compared to the Faster R-CNN, SSD, and YOLOv5 network models, respectively, and the recognition performance of GA-Mask R-CNN is found to be better than the other network models through the recognition effect.

5. Conclusions

In this paper, we constructed a multi-source rice stem borer and rice leaf roller image dataset and GA-Mask R-CNN target detection model to identify and monitor rice canopy pests, and the conclusions are as follows:

(1): The bug generator of generative adversarial network is utilized to enhance the sensitivity of the classification network to the insect body information, which improves the accuracy and robustness of the pest detection.
(2): Multilevel residual connectivity and ECA attention module are used in the Mask R-CNN backbone network ResNet101 to improve the recognition accuracy of the model and to suppress the gradient vanishing and the gradient explosion problems in order to improve the stability and the convergence of the model.
(3): The performance test of rice canopy pests was conducted on the pest monitoring equipment, and the results showed that the performance indexes of the improved model were all better than those of Faster-RCNN, SSD, YOLOv5, and other models, which proved the effectiveness and superiority of the scheme.

Author Contributions

Conceptualization, S.L. (Sitao Liu) and S.F.; methodology, S.L. (Sitao Liu) and A.H.; validation, P.M.; formal analysis, X.H. and S.L. (Shuangxi Liu); investigation, X.T.; resources, S.F.; writing—original draft, S.L. (Sitao Liu); writing—review and editing, S.F. and A.H.; visualization, X.T.; supervision, H.Z.; project administration, P.M. and A.H.; funding acquisition, X.H. and S.L. (Shuangxi Liu). All authors have read and agreed to the published version of the manuscript.

Funding

The name of the grant fund for this research is Shandong Modern Agricultural Industrial Technology System Rice Agricultural Machinery Post Expert Project, Grant No. SDAIT-17-08.

Data Availability Statement

All the data mentioned in the paper are available through the corresponding author.

Acknowledgments

The authors would like to acknowledge the valuable comments by the editors and reviewers, which have greatly improved the quality of this work.

Conflicts of Interest

The authors declare no conflict of interest.

References

Liao, J.; Chen, M.H.; Zhang, K.; Zou, Y.; Zhang, S.; Zhu, D. Plant segmentation model of crop seedling stage based on fusion of regional semantics and edge information. J. Agric. Mach. 2021, 12, 71–181. [Google Scholar]
Vasseghian, Y.; Berkani, M.; Almomani, F.; Dragoi, E.N. Data Mining For Pesticide Decontamination Using Heterogeneous Photocatalytic Processes. Chemosphere 2021, 270, 129449. [Google Scholar] [CrossRef] [PubMed]
Li, D.; Wang, M.M.; Liu, J.D.; Chen, F. Strip surface defect identification based on lightweight convolutional neural network. Chin. J. Sci. Instrum. 2022, 3, 240–248. [Google Scholar]
Wu, L.; Han, S.; Chen, A.; Salama, P.; Dunn, K.W.; Delp, E.J. RCNN-SliceNet: A Slice and Cluster Approach for Nuclei Centroid Detection in Three-Dimensional Fluorescence Microscopy Images. In Conference on Computer Vision and Pattern Recognition Workshops, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2021, Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; Volume 15753, pp. 3755–3765. [Google Scholar]
Lin, T.L.; Chang, H.Y.; Chen, K.H. The Pest and Disease Identification in the Growth of Sweet Peppers Using Faster R-CNN and Mask R-CNN. J. Internet Technol. 2020, 2, 605–614. [Google Scholar]
Rong, M.; Wang, Z.; Ban, B.; Guo, X. Pest Identification and Counting of Yellow Plate in Field Based on Improved Mask R-CNN. Discret. Dyn. Nat. Soc. 2022, 2022, 1913577. [Google Scholar] [CrossRef]
Li, Y.; Xiang, Y.; Feng, Q. Method of Locating the Strike Point on Pest for Laser Control Based on Mask R-CNN. In Proceedings of the 2022 International Conference on Guidance, Navigation and Control, Harbin, China, 5–7 August 2022; pp. 710–718. [Google Scholar]
Lyu, S.; Ke, Z.; Li, Z.; Xie, J.; Zhou, X.; Liu, Y. Accurate Detection Algorithm of Citrus Psyllid Using the YOLOv5s-BC Model. Agronomy 2023, 13, 896. [Google Scholar] [CrossRef]
Chen, Y.; Yuan, X.; Wu, R.; Wang, J.; Hou, Q.; Cheng, M.M. YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection. Comput. Vis. Pattern Recognit. 2023, 2308, 5480. [Google Scholar]
Jia, L.; Wang, T.; Chen, Y.; Zang, Y.; Li, X.; Shi, H.; Gao, L. MobileNet-CA-YOLO: An Improved YOLOv7 Based on the MobileNetV3 and Attention Mechanism for Rice Pests and Diseases Detection. Agriculture 2023, 13, 1285. [Google Scholar] [CrossRef]
Zhang, J.; Wang, J.; Zhao, M. A Lightweight Crop Pest Detection Algorithm Based on Improved Yolov5s. Agronomy 2023, 13, 1779. [Google Scholar] [CrossRef]
Wang, L.; Shi, W.; Tang, Y.; Liu, Z.; He, X.; Xiao, H.; Yang, Y. Transfer Learning-Based Lightweight SSD Model for Detection of Pests in Citrus. Agronomy 2023, 13, 1710. [Google Scholar] [CrossRef]
Zheng, W.; Tang, W.; Jiang, L.; Fu, C.W. SE-SSD: Self-Ensembling Single-Stage Object Detector from Point Cloud. Comput. Vis. Pattern Recognit. 2021, 9804, 14494–14503. [Google Scholar]
Chen, J.; Han, M.; Lian, Y.; Zhang, S. Image segmentation of hybrid rice kernel based on U-Net model. Trans. Chin. Soc. Agric. Eng. 2019, 10, 174–180. [Google Scholar]
Prakruti, V.B.; Sarangi, S.; Pappula, S. Detection of diseases and pests on images captured in uncontrolled conditions from tea plantations. In Autonomous Air and Ground Sensing Systems for Agricultural Optimization and Phenotyping IV; SPIE: Bellingham, WA, USA, 2019; Volume 11008. [Google Scholar]
Lin, C.H.; Chen, P.H.; Lin, C.; Chen, Y.C.; Huang, M.J.; Liu, W.M. Automatic Detection and Counting of Small Yellow thrips on Lotus Leaf Back Based on YOLO Combined with VDSR and DPSR Network. In Proceedings of the Thirteenth International Conference on Digital Image Processing, Virtual, 20–23 May 2021; p. 11878. [Google Scholar]
Khalid, S.; Oqaibi, H.M.; Aqib, M.; Hafeez, Y. Small Pests Detection in Field Crops Using Deep Learning Object Detection. Sustainability 2023, 15, 6815. [Google Scholar] [CrossRef]
Arshaghi, A.; Ashourian, M.; Ghabeli, L. Potato diseases detection and classification using deep learning methods. Multimed. Tools Appl. 2023, 82, 5725–5742. [Google Scholar] [CrossRef]
Lippi, M.; Carpio, R.F.; Contarini, M.; Speranza, S.; Gasparri, A. A Data-Driven Monitoring System for the Early Pest Detection in the Precision Agriculture of Hazelnut Orchards. IFAC-PapersOnLine 2022, 55, 32. [Google Scholar] [CrossRef]
Kasinathan, T.; Uyyala, S.R. Detection of fall armyworm (Spodoptera frugiperda) in field crops based on mask R-CNN. Signal Image Video Process. 2023, 17, 2689. [Google Scholar] [CrossRef]
Kaur, P.; Harnal, S.; Gautam, V.; Singh, M.P.; Singh, S.P. Hybrid deep learning model for multi biotic lesions detection in solanum lycopersicum leaves. Multimed. Tools Appl. 2023, 1, 25. [Google Scholar] [CrossRef]
Barmpoutis, P.; Kamperidou, V.; Stathaki, T. Estimation of extent of trees and biomass infestation of the suburban forest of Thessaloniki (Seich Sou) using UAV imagery and combining R-CNNs and multichannel texture analysis. Mach. Vis. Appl. 2020, 11433, 114333C. [Google Scholar]
Lin, X.Z.; Zhang, J.Y.; Xu, X.; Zhu, S.H.; Liu, D.Y. Recognition and classification of Rice planthopper based on incomplete insect images based on dictionary learning and SSD. Trans. Chin. Soc. Agric. Mach. 2021, 52, 165–171. [Google Scholar]
Gonçalves, J.; Silva, E.; Faria, P.; Nogueira, T.; Ferreira, A.; Carlos, C.; Rosado, L. Edge-Compatible Deep Learning Models for Detection of Pest Outbreaks in Viticulture. Agronomy 2022, 12, 3052. [Google Scholar] [CrossRef]
Liang, Y.; Qiu, R.Z.; Li, Z.P.; Chen, S.X.; Chen, Z.; Zhao, J. Identification method of rice major pests based on YOLOv5 and multi-source data set. Trans. Chin. Soc. Agric. Mach. 2002, 53, 250–258. [Google Scholar]
Wang, J.; Ma, B.; Wang, Z.; Liu, S.; Mu, J.; Wang, Y. Pest identification method in apple orchards based on improved Mask R-CNN. Trans. Chin. Soc. Agric. Mach. 2023, 54, 253–263. [Google Scholar]
Ramadan, H.; Lachqar, C.; Tairi, H. A Survey of Recent Interactive Image Segmentation Methods. Comput. Vis. Media 2020, 6, 355–384. [Google Scholar] [CrossRef]
Lu, A.Q.; Qin, C.C.; Hu, S.B.; Li, G.Q. Research on deep interactive image segmentation with extreme point feature. Inf. Commun. 2020, 6, 66–69. [Google Scholar]
Lin, Z.; Zhang, Z.; Chen, L.-Z.; Cheng, M.-M.; Lu, S.-P. Interactive Image Segmentation with First Click Attention. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 13336–13345. [Google Scholar]
Li, Q.X.; Wang, Q.H.; Ma, M. Research on egg image data generation based on generative adduction network. Trans. Chin. Soc. Agric. Mach. 2021, 52, 236–245. [Google Scholar]
Ye, Y.; Shen, B.; Shen, Y. Shadow resistant tree detection method based on generation adduction network. Trans. Chin. Soc. Agric. Eng. 2021, 37, 118–126. [Google Scholar]
Pan, Y.; Jin, M.; Zhang, S.; Deng, Y. TEC map completion using DCGAN and Poisson blending. Space Weather.—Int. J. Res. Appl. 2020, 18, 5. [Google Scholar] [CrossRef]
Chen, H.; Zhen, X.; Zhao, T. An adaptive image fusion data enhanced method for Pika target detection. Trans. Chin. Soc. Agric. Eng. 2022, 38, 170–175. [Google Scholar]
Yu, X.M.; Hong, S.; Yu, J.X.; Lu, Y.B.; Peng, Y. Research on enhancement method of ship target data in visible remote sensing image. Chin. J. Sci. Instrum. 2020, 41, 261–269. [Google Scholar]
Brock, A.; Donahue, J.; Simonyan, K. Large Scale GAN Training for High Fidelity Natural Image Synthesis. Learning 2019, 1809, 11096. [Google Scholar]
Afzaal, U.; Bhattarai, B.; Pandeya, Y.R.; Lee, J. An Instance Segmentation Model for Strawberry Diseases Based on Mask R-CNN. Sensors 2021, 21, 6565. [Google Scholar] [CrossRef]
Taş, M.; Taş, Y.; Balki, O.; Aydın, Z.; Taşdemir, K. Camera-based wildfire smoke detection for foggy environments. J. Electron. Imaging 2022, 31, 5. [Google Scholar] [CrossRef]
Lin, X.Z.; Zhu, S.H.; Zhang, J.Y.; Liu, D.Y. Image classification method of rice planthopper based on transfer learning and Mask R-CNN. Trans. Chin. Soc. Agric. Mach. 2019, 50, 201–207. [Google Scholar]
Qin, H.; Zhang, D.; Tang, Y.; Wang, Y. Automatic Recognition of Tunnel Lining Elements From Gpr Images Using Deep Convolutional Networks With Data Augmentation. Autom. Constr. 2021, 130, 103830. [Google Scholar] [CrossRef]
Shen, C.; Qi, G.J.; Jiang, R.; Jin, Z.; Yong, H.; Chen, Y.; Hua, X.S. Sharp Attention Network via Adaptive Sampling for Person Re-identification. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 3016–3027. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of rice pest monitoring equipment: (a) Box; (b) Support posts; (c) Fill light; (d) Impact plate; (e) Trapping device; (f) Solar panels; (g) Image acquisition device; (h) Top rack; (i) Box lid; (j) Support seat; (k) Blower.

Figure 2. Images of samples collected by rice pest monitoring equipment.

Figure 3. Pest traps take sample images.

Figure 4. Image of pest samples on rice plants.

Figure 5. Effect of Haar feature segmentation.

Figure 6. CrabCut image segmentation effect.

Figure 7. Images enhanced with conventional data: (a) Raw image; (b) Shading adjustment; (c) Mirror flip; (d) Gaussian noise; (e) Pulse noise; (f) Random occlusion; (g) Random clipping; and (h) Rotation angle.

Figure 8. Schematic diagram of generating adversarial network structure.

Figure 9. DCGAN loss function curve.

Figure 10. Pest image generation process of DCGAN generation model.

Figure 11. Image annotation process of pest samples.

Figure 12. Basic frame structure of the GA-Mask R-CNN model.

Figure 13. GA-Mask R-CNN model backbone network structure.

Figure 14. Multilevel residual connection structure diagram.

Figure 15. Schematic diagram of lightweight channel attention ECA module.

Figure 16. Simulation test effect diagram.

Figure 17. Loss function change curve: (a) Loss function based on tensorborad; (b) Training loss of the overall model; (c) Training loss of the RPN network.

Figure 18. Field site of rice pest monitoring equipment in Tannan Farm.

Figure 19. Real condition test effect diagram.

Figure 20. Change of P-R curve of GA-Mask R-CNN model.

Table 1. Pest monitoring sites of Shandong Academy of Agricultural Sciences.

Address	Longitude and Latitude
Weifang city Shouguang pest monitoring station	119.1° E, 36.6° N
Linyi city Lanling county Jinsui family farm	118.2° E, 34.8° N
Weihai Rushan pest monitoring station	121.5° E, 36.9° N
Zibo city Yiyuan pest monitoring station	118.2° E, 36.2° N
Jinan city Jiyang pest monitoring station	117.2° E, 37.0° N
Yantai Muping pest monitoring station	121.6° E, 37.4° N
Dezhou Qingyun pest monitoring station	117.4° E, 37.8° N
Liaocheng Linqing pest monitoring station	115.5° E, 36.7° N
Heze city Caoxian pest monitoring station	115.5° E, 34.8° N
Taian pest monitoring station	117.1° E, 36.2° N
Jining city Yutai pest monitoring station	116.7° E, 35.0° N
Tancheng County Puwang farm, Linyi city	118.3° E, 34.5° N
Dongying Huanghekou pest monitoring station	118.8° E, 37.9° N
Linyi city Junan pest monitoring station	118.8° E, 35.2° N
Qingdao Laixi pest monitoring station	120.5° E, 36.9° N

Table 2. Pest dataset of rice canopy.

Data Set	Original Sample/Sheet		Expanded Sample/Sheet
Data Set	Rice Stem Borer	Rice Leaf Roller	Rice Stem Borer	Rice Leaf Roller
Bug monitoring equipment	1478	1527	5912	6180
Pest traps	2116	2160	8464	8640
Pests on rice plants	229	208	2178	2476
Multi-source data sets	3823	4195	16,554	17,296

Table 3. Numerical statistics of identification results.

Experiment Round	Target Sample	Interference Sample	Original Model		Improved Model
Experiment Round	Target Sample	Interference Sample	Number of Detections	Number of False Detections	Number of Detections	Number of False Drops
1	50	65	41	12	40	6
2	80	95	68	20	65	10
3	120	120	102	24	103	7
4	170	155	138	30	143	12
5	230	205	195	35	198	16

Table 4. Comparison of main test parameters.

Evaluation Index	Original Mask R-CNN Model	GA-Mask R-CNN Model
Multitask loss function	0.02711	0.02382
Average precision	85.64%	92.71%
Recall	81.63%	89.28%
F1	0.8213	0.9096
Mrcnn_bbox_loss	1.1253 × 10⁻³	8.1745 × 10⁻⁴
Mrcnn_class_loss	2.1068 × 10⁻³	1.5485 × 10⁻³
Mrcnn_mask_loss	0.02227	0.02136
Rpn_bbox_loss	1.6175 × 10⁻³	1.0825 × 10⁻³
Rpn_class_loss	1.6492 × 10⁻⁵	1.1164 × 10⁻⁵

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, S.; Fu, S.; Hu, A.; Ma, P.; Hu, X.; Tian, X.; Zhang, H.; Liu, S. Research on Insect Pest Identification in Rice Canopy Based on GA-Mask R-CNN. Agronomy 2023, 13, 2155. https://doi.org/10.3390/agronomy13082155

AMA Style

Liu S, Fu S, Hu A, Ma P, Hu X, Tian X, Zhang H, Liu S. Research on Insect Pest Identification in Rice Canopy Based on GA-Mask R-CNN. Agronomy. 2023; 13(8):2155. https://doi.org/10.3390/agronomy13082155

Chicago/Turabian Style

Liu, Sitao, Shenghui Fu, Anrui Hu, Pan Ma, Xianliang Hu, Xinyu Tian, Hongjian Zhang, and Shuangxi Liu. 2023. "Research on Insect Pest Identification in Rice Canopy Based on GA-Mask R-CNN" Agronomy 13, no. 8: 2155. https://doi.org/10.3390/agronomy13082155

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Insect Pest Identification in Rice Canopy Based on GA-Mask R-CNN

Abstract

1. Introduction

2. Materials and Methods

2.1. Acquisition of Pest Images

2.1.1. Collection of Pest Monitoring Equipment

2.1.2. Pest Trap Collection

2.1.3. Field Photography of Pests in Paddy Fields

2.2. Single Pest Sample Acquisition

2.2.1. Splitting of Pest Samples for Trapping Equipment

2.2.2. Canopy Plant Pest Sample Segmentation

2.3. Sample Image Amplification

2.3.1. General Data Enhancement

2.3.2. Deep Convolutional Generation Adversarial Network

2.4. Data Set Construction

3. GA-Mask R-CNN Model Construction

3.1. Identify the Basic Structure of the Model Network

3.2. Multi-Level Residual Connection

3.3. ECA Attention Module

4. Test Results and Analysis

4.1. Test Environment and Hyperparameter Setting

4.2. Test Evaluation Index

4.3. Analysis of Test Results

4.3.1. Analysis of Test Results in Laboratory Environment

4.3.2. Analysis of Test Results under Monitoring Equipment

4.3.3. Influence of Different Models on Detection Performance

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI