Early Identification and Localization Algorithm for Weak Seedlings Based on Phenotype Detection and Machine Learning

Xu, Shengyong; Zhang, Yi; Dong, Wanjing; Bie, Zhilong; Peng, Chengli; Huang, Yuan

doi:10.3390/agriculture13010212

Open AccessArticle

Early Identification and Localization Algorithm for Weak Seedlings Based on Phenotype Detection and Machine Learning

¹

College of Engineering, Huazhong Agricultural University, Wuhan 430070, China

²

Key Laboratory of Agricultural Equipment for the Middle and Lower Reaches of the Yangtze River, Ministry of Agriculture, Huazhong Agricultural University, Wuhan 430070, China

³

Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Shenzhen 518000, China

⁴

Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518000, China

⁵

College of Horticulture and Forestry Sciences, Huazhong Agricultural University, Wuhan 430070, China

⁶

Key Laboratory of Horticultural Plant Biology, Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China

⁷

Electronic Information School, Wuhan University, Wuhan 430072, China

^*

Author to whom correspondence should be addressed.

Agriculture 2023, 13(1), 212; https://doi.org/10.3390/agriculture13010212

Submission received: 2 December 2022 / Revised: 3 January 2023 / Accepted: 11 January 2023 / Published: 14 January 2023

(This article belongs to the Section Digital Agriculture)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

It is important to propose the correct decision for culling and replenishing seedlings in factory seedling nurseries to improve the quality of seedlings and save resources. To solve the problems of inefficiency and subjectivity of the existing traditional manual culling and replenishment of seeds, this paper proposes an automatic method to discriminate the early growth condition of seedlings. Taking watermelon plug seedlings as an example, Azure Kinect was used to collect data of its top view three times a day, at 9:00, 14:00, and 19:00. The data were collected from the time of germination to the time of main leaf growth, and the seedlings were manually determined to be strong or weak on the last day of collection. Pre-processing, image segmentation, and point cloud processing methods were performed on the collected data to obtain the plant height and leaf area of each seedling. The plant height and leaf area on the sixth day were predicted using an LSTM recurrent neural network for the first three days. The R squared for plant height and leaf area prediction were 0.932 and 0.901, respectively. The dichotomous classification of normal and abnormal seedlings was performed using six machine learning classification methods, such as random forest, SVM, and XGBoost, for day six data. The experimental results proved that random forest had the highest classification accuracy of 84%. Finally, the appropriate culling and replenishment decisions are given based on the classification results. This method can provide some technical support and a theoretical basis for factory seedling nurseries and transplanting robots.

Keywords:

strong seedling model; phenotype measurement; machine learning; grow prediction

1. Introduction

China is recognized as a large vegetable producer and vegetable consumer. China’s vegetable sown area and production account for 52.25% and 58.31% of the world’s total planted area and production, respectively, ranking first in the world [1]. With the development of modern facility agriculture, the scale of intensive vegetable production has expanded. Centralized and factory nurseries have become an inevitable trend and are widely used in agricultural production activities around the world [2]. However, in factory nursery production, the seedling success rate ranges from 80% to 95%, and the reasons for not emerging mainly include a lack of seedlings, diseased seedlings et al. [3]. The transplant, culling, and replenishment of seedlings before leaving the factory are key steps in determining the quality and yield of vegetable seedlings. There is still relatively little research on how to pick and replenish weak seedlings, and most of the related work is performed manually. However, the high temperature, high humidity, and high degree of confinement in the greenhouse make it extremely difficult for workers to pick and replenish seedlings, and there are also disadvantages, such as high subjectivity, low efficiency, and high costs in picking and replenishing seedlings manually [4]. It is not possible to accurately predict the growth of seedlings and target replenishment by relying on experience alone. In order to have a high neatness of seedlings at the factory, seedling factories often reduce the loss caused by the lack of seedlings and diseased seedlings by increasing the number of seeds sown, which leads to a great loss in the economy of factory seedlings. A reliable early identification system for weak seedlings can help nursery plants quickly locate weak and dead seedlings and target transplanting and replanting operations, greatly improving the efficiency and economy of plant nurseries.

The gradual integration of computer technology and agricultural knowledge has enabled the study of crop morphological structures and physiological functions to cross over to the stage of digitalization and visualization [5]. Researchers have applied machine vision and spectroscopy to high-throughput crop phenotyping to achieve autonomous monitoring, analysis, and the application of crop physiological and ecological information [6,7,8]. Crop phenotype detection technology is the basis for growth modeling. Three-dimensional vision technology can store the 3D information of plant shapes and organs in the computer to reproduce the morphological structure of crops. It can analyze and detect the dynamic process of plant growth and plant–environment interactions, which accelerates the scholars’ quantitative research on the process and laws of crop growth and development [9,10]. A 3D vision generally uses 3D imaging techniques such as depth cameras, binocular vision, and depth estimation for phenotypic studies of crops. A large number of relevant studies have been generated in recent years. For example, Jin proposed a low-damage transplanting method for leafy vegetable seedlings based on machine vision and image processing to solve the problem of high damage rates in seedling transplanting in horticultural facilities. He used the Intel D415 camera to obtain the height and extreme edge points of seedlings and performed path planning for the end-effector based on coordinated information to achieve the low-damage transplantation of seedlings and improve the success rate of seedling transplantation [11].

Three-dimensional vision technology can make up for the shortcomings of machine vision and 3D vision can obtain the actual phenotype data of the research object, which is excellent in crop growth quality monitoring. For example, Yang et al. proposed an RGB-D camera-based method for in situ measurements of vegetable seedling height parameters in greenhouse nursery trays. He combines 3D point cloud filtering with clustering technology, which can effectively filter out the soil background point cloud set and realize in situ point cloud segmentation, and the average relative error of its plant height measurement is 7.69%; the accuracy can reach the standard for practical production applications and scientific research needs [12]. Teng et al. used Azure Kinect for 3D reconstruction of the seedling moss stage and proposed an improved point cloud alignment method based on ICP, which aligns the point cloud of each viewpoint three times consecutively by continuously decreasing the distance threshold between the grid size and the corresponding point until the complete color point cloud information is obtained. This method increases the accuracy to 92.5% and has the potential to be widely used for the non-destructive testing of oilseed rape phenotypes with low cost and high accuracy [13]. Otoya et al. used the RealSense D435 depth camera to grade artichokes. The leaf area estimation method based on point cloud segmentation and the triangulation algorithm classified artichokes into four grades: high-quality seedlings, medium-quality seedlings, poor-quality seedlings, and no seedlings, and this method enabled the non-destructive assessment of seedling quality [14]. Nguyen et al. performed the precise 3D reconstruction of cabbage, cucumber, and tomato seedlings by using a structured light-based 3D reconstruction method and accurately estimated plant phenotypic characteristics such as leaf number, plant height, and leaf size without destroying any part of the plant [15]. Chen et al. used the structure from the motion method to obtain the point clouds of plants and proposed a fuzzy C-mean clustering-based point cloud segmentation method for individual plants, which finally realized the grid method to calculate the leaf area. This method improves the accuracy of leaf area calculation for overlapping leaves and complex angle shots to a certain extent [16]. Wang et al. proposed a KinectV2 camera-based nondestructive monitoring method for the growth process of factory plug seedlings to achieve the nondestructive measurement of plug seedlings. He obtained the germination rate of seedling trays by threshold segmentation and the morphological processing of color images and completed the analysis of plant height and leaf area for the seedlings by converting depth images into point clouds, realizing nondestructive monitoring for germination rate, plant height, leaf area, and the seedling index of cavity trays [17]. Zhang et al. took cucumber cavity tray seedlings as the research object and proposed a point cloud processing-based automatic detection method for late seedling emergence in cavity trays. The leaf area and plant height were obtained by the α-shape algorithm; the method of locating the top of seedling stems based on the principal curvature, and the product of leaf area and plant height was used as the grading factor to achieve the automatic detection of late seedling emergence [18].

Crop phenotype data based on 3D vision technology can well describe the current crop growth condition, and combined with machine learning or deep learning techniques, can further predict the crop growth trend. For example, Zhang et al. proposed a method to measure the 3D morphological characteristics of plants and established a plant time-series growth equation and visualization model to present the growth process of Arabidopsis dynamically, which facilitates the phenotype detection of Arabidopsis. However, due to the method of generating point clouds as a structure and the need to rely on L-studio software to fit the mathematical growth equations, the modeling speed is slow and cannot achieve the speed and portability required for practical production [19]. An et al. designed an automated high-throughput plant phenotype detection pipeline for monitoring the growth of rosettes. This pipeline is topped with 18 cameras and is capable of holding 4 × 4 seedling trays for a total of 16 trays. With this device, images of rosettes can be taken continuously, and the power-law distribution between the total leaf growth area and rosette area can be analyzed from the time series. However, this device is complex, costly, and less portable [20].

In summary, phenotypic characteristics, such as leaf area and plant height, are the main parameters for evaluating and predicting plant growth [21,22]. Plant height determines whether seedlings are spindling, while the leaf area is a determinant of seedling growth, strengths, or weakness. The joint growth prediction of these two characteristics is expected to achieve the discrimination of the seedlings’ strength and weakness indicators. Since the growth model of seedlings carries time-series information, the growth status of one day is necessarily highly correlated with the growth status of the next. The long and short memory network (LSTM) has been superior in the analysis of time-series dynamical systems in several fields [23]. LSTM can solve the situation of gradient disappearance and explosion in traditional recurrent neural networks (RNN) and could trace back more time-series information to make the model’s prediction more explanatory. In contrast, traditional machine learning binary classification networks such as SVM, random forest, and XGBoost can jointly model the two features obtained from the prediction with a strong classification ability and less impact on discrete points. In order to solve the problem of early identification and the location of weak seedlings, a phenotype-based growth prediction and strong seedling discrimination model are proposed in this paper. The model has high detection and prediction accuracy and can not only discriminate weak seedlings but also locate weak seedlings, which can provide information on the number of seedlings and the location of seedlings for the dividing and combining robot and has good practical value and application prospects.

2. Materials and Methods

2.1. Experimental Materials and Data Acquisition

The experiment was conducted in July 2022 in a small daylight greenhouse at the vegetable improvement base of Huazhong Agricultural University with a north–south layout and free control of shade curtains to control the temperature and humidity as well as light in the greenhouse. The watermelon variety tested was the common variety “Zaojia (84-24)”, with a total of 16 trays and 788 seedlings emerging. The growth cycle of seedlings was 8–10 days. The cultivation substrate used for growing watermelon was grass charcoal, vermiculite, and perlite, uniformly mixed according to a volume ratio of 3:1:1, while drip irrigation was used.

Kinect 3D sensor real-time acquisition algorithms can meet the requirements of fast, accurate, real-time crop growth pattern image information acquisition, which has become a development trend and a necessary means of digital agricultural production management [24]. The data acquisition device for this paper is Azure Kinect DK from Microsoft. The platform for data acquisition is shown in Figure 1 and consists mainly of the Azure Kinect sensor, a computer, and a shaded photo booth. The Kinect was mounted on a steel mount, looking down 90° at a distance of about 0.45 m, with the camera plane parallel to the shooting platform. The computer is used to acquire and process the images captured by the Kinect. The data was collected from the time the seedlings sprouted to the time they developed their true leaves, using Azure Kinect to take top views of the entire tray of watermelon seedlings three times a day at 9:00, 14:00, and 19:00. Since the color camera lens of the Azure Kinect sensor is extremely exposed, the data acquisition was chosen to take place in a dark room.

The color image contains the color information of plug seedlings, and the rich RGB features in the color information have a better processing effect for seedling positioning and image segmentation. The depth image contains information about the actual distance from the camera lens to the seedlings in the cavity tray and has high accuracy in phenotype detection. It can be used for the non-destructive detection of 3D phenotype data from the seedlings. The joint analysis of color and depth images requires the alignment of the two images. The depth image is aligned to the color image using the transformation depth image to color camera function in the Kinect SDK during data acquisition, and the aligned depth image has the same pixels as the color image so that the depth information can be directly segmented and recognized based on the color information. Figure 2 shows the continuously acquired color image with the aligned depth image.

The robustness of each seedling was assessed manually on the sixth day of data acquisition. The assessment results were divided into two categories: normal seedlings and abnormal seedlings. Abnormal seedlings were weak seedlings with dwarf plants and smaller wilted leaves or spindling seedlings with thin stems and tall plants, while the rest were normal seedlings. Table 1 shows the statistics of all the sample data.

2.2. Overall Flow Chart

The flow chart of the technical approach in this paper is shown in Figure 3. It includes four parts: data acquisition, seedling location, phenotype detection, and weak seedling identification. Data acquisition includes image data acquisition by the RGB-D camera and manual acquisition of plant height and leaf area. Seedling location and phenotype detection were performed by image processing and point cloud processing using the collected data, and validation experiments were conducted simultaneously. The weak seedling discrimination system uses LSTM and a random forest classification model to jointly predict the dual features of plant height and leaf area to obtain the final weak seedling discrimination model.

2.3. Seedling Positioning and Indexing Methods in Cavity Trays

2.3.1. Plug-Hole Location and Indexing

The first step is to detect the plug holes of plug seedlings, and the most critical is to determine the location of the plug boundary. Since the growth cycle of seedlings in this experiment was the seedling stage, there was no problem with incomplete information on the boundary of the plug due to the shading of seedlings. In order to obtain accurate information about the location of the plug-hole boundaries, it is necessary to segment the seedlings and soil information more precisely to leave the plug-hole boundaries that are needed.

As shown in Figure 4, the information of seedlings can be removed by first Extra Green and inverting the color image of the watermelon seedling tray taken. Threshold segmentation is a typical algorithm for segmentation based on gray value features in image processing. Since the boundary of the plug and the soil information have different gray level ranges, the OTSU threshold segmentation of the color map with the seedling information removed can obtain a binarized image containing only the boundary information of the plug.

The binary image of plug seedlings after segmentation also contains noise, and if the noise information cannot be accurately removed, it will interfere with the subsequent processing and even affect the correctness of the results, so removing noise is a necessary part of the image after binarization. The noise is formed by the fine pixel points of the soil, and the boundary of the plug should be preserved, so a 3 × 3 kernel is used to open the operation so that the boundary of each hole of the plug can be more clearly shown, and the information of the plug seedings and the soil substrate can be divided. The subsequent noise is mostly scattered in small areas and single-connected areas. To remove such noise, calculate the area of all single-connected areas in the pixel points, then set the threshold value and set the pixel values of all areas with area values less than this threshold value to 0. Until now, it has been possible to split off unwanted seedling and soil information more precisely and keep the information we need about the boundary of the plug.

Since the plug holes have a standard structure and are arranged in a square matrix, the boundaries are continuous in the horizontal and vertical directions, so the plug hole boundaries can be determined using the pattern of the pixel statistics of the seeding plug image with the change in horizontal and vertical coordinates. For each column and row of the graph, the pixel values are counted separately from the trends.

The horizontal and vertical wave peaks correspond to the hole boundaries, and the coordinates of the wave peaks are the coordinates of the pixel points corresponding to the hole boundaries, which can accurately determine the location information of each hole boundary to achieve the plug hole location. The red points in the boundary identification part of Figure 4 are the results of peak point detection.

2.3.2. Seedling Index

Watermelon seedlings may grow skewed during the germination period due to phototropism and water control, which may cause the seedlings to be photographed outside of the center of the hole. Even if the hole location is correctly located, accurate seedling information is not obtained due to skewing. To address this problem, a seedling image skew correction algorithm is proposed.

The first Extra Green is performed to retain only the seedling information, and the localization range is expanded for the localized plug-hole images, as in Figure 5b. Use the Moments function in the OpenCV to obtain the centroid coordinates of each connected domain, as in Figure 5c, and calculate the Euclidean distance between the centroid coordinates and the center of the image, and the location of the seedling with the smallest centroid is the location of the seedling in this hole. Figure 5e shows the effect of seedlings’ plug holes after correction.

2.4. Seedling Phenotype Detection Algorithm

2.4.1. Seeding Height Measurement

The seeding height H was defined as the vertical distance from the root of the seedling stalk to the top of the leaf. As shown in Figure 6, the field of view of the camera is parallelogram ABCD, h₁ is the distance from the root of the main stalk of the seedling measured by the camera, h₂ is the distance from the top of the leaf measured by the camera, and Equation (1) is the formula for calculating the seeding height H.

H = h_{1} - h_{2}

(1)

Since the soil plane is not a flat plane and the soil height varies for each seedling, it is not possible to measure the seeding height with a uniform height, and each seedling needs to be analyzed individually. To measure the height of each seedling, proceed as follows:

Step 1: The depth information in the unoperated depth image is cumbersome, and the color image is the first Extra-Green first to remove the seedling information.

Step 2: The depth information of the soil can be removed by removing the depth information of non-zero pixel locations in the corresponding depth image.

Step 3: Only the depth information of the seedlings is left in the depth image with the soil information removed, and the difference between the maximum and minimum values is the seeding height at this point.

2.4.2. Leaf Area Measurement

The leaf area of seedling leaves will occur by non-spreading, and the 2D image is no longer able to accurately estimate the leaf area, which needs to be measured by converting the depth image pixels into 3D spatial coordinates (3D point cloud). According to the Kinect imaging principle, the conversion formula of depth image and 3D spatial coordinates is shown in Equation (2).

{\begin{matrix} x_{w} = z_{c} \cdot (u - u_{0}) \cdot d_{x} / f \\ y_{w} = z_{c} \cdot (v - v_{0}) \cdot d_{y} / f \\ z_{w} = z_{c} \end{matrix}

(2)

In Equation (2), (

x_{w}

,

y_{w}

,

z_{w}

) is the 3D spatial coordinate corresponding to the point (

u

,

v

), (

u

,

v

) is the pixel coordinate of any point of the depth image,

z_{c}

is the depth of information corresponding to the point (

u

,

v

), is the focal length of the IR camera, and (

u_{0}

,

v_{0}

) is the optical center coordinate of the IR camera. Use PCL to generate an empty point cloud, and then add the coordinate points converted to 3D spatial coordinates to the point cloud file, and the RGB information contained in each point is the RGB information of the coordinates corresponding to the color image in order to obtain the spatial point cloud map containing color information, as shown in Figure 7a. The neighborhood extreme filtering method [12] can eliminate the dragging problem caused by the depth camera shooting and obtain a pure leaf point cloud.

The greedy projection triangulation of the leaf corresponds to the noise reduction of the leaf point cloud as the basis for the construction of a triangular network to obtain a model containing some triangular slices, as shown in Figure 7b. Each of these triangular slices contains real position information, and the area of each triangle can be calculated based on the three sides of the triangle by indexing the original position and depth values through the 3D point cloud, thus calculating the length of the three sides of each triangle.

The area of each triangle is found by using the three side lengths through the Helen formula, and finally, the total area of the leaf can be obtained by adding up the areas of all the triangular face pieces. The specific formula is as follows.

P_{i} = (a_{i} + b_{i} + c_{i}) / 2

(3)

S_{t i} = \sqrt{P_{i} (P_{i} - a_{i}) (P_{i} - b_{i}) (P_{i} - c_{i})}

(4)

S_{a l l} = \sum_{i = 0}^{n} S_{t i}

(5)

In the above formula,

a_{i}

,

b_{i}

, and

c_{i}

denote the side lengths of the three sides of the

i

th triangle after greedy triangulation;

S_{t i}

denotes the area of one of the triangles;

S_{a l l}

denotes the sum of the areas of all the triangles, i.e., the total leaf area of the whole tray of watermelon seedlings; and

n

denotes the number of triangles.

2.5. LSTM-Based Phenotype Prediction Model for Seedling

Long short-term memory (LSTM) is a special kind of recurrent neural network (RNN), which is mainly designed to solve the gradient disappearance and gradient explosion problems during the training of long series. LSTM can handle sequence-changing data and has a better performance in longer series compared to general neural networks. Therefore, LSTM is widely used in time series problems, such as time series, stock prediction, speech recognition, and signal analysis problems. For the continuous time series of watermelon seedling growth conditions, the growth conditions of the previous day are inevitably highly related to the growth conditions of the following day and influence the growth conditions of the latter day. The use of phenotypic features alone without considering the association between different schedule types can lead to misclassification. The shifting of its cellular state in the LSTM structure describes exactly that feature. For the phenotypic information measured from continuously acquired image information, using LSTM networks can make full use of the continuity between features to tap the temporal information carried between images and maximize the accuracy of discrimination. Therefore, this paper selects the LSTM neural network architecture to build a growth prediction model for watermelon seedlings at the seedling stage.

The structure of the LSTM network is shown in Figure 8, which consists of multiple neurons connected at the beginning and end, and each neuron consists of gating structures and cell memory units inside, allowing it to handle data prediction tasks with long time series comfortably. The gating structure contains the forget gate, input gate, and output gate, which work together to determine the surrender and preservation of information.

The forget gate

f_{t}

determines the amount of information forgotten at the previous moment. The input gate determines the amount of information updated to the cell memory units at the current moment, including

i_{t}

which determines the degree of cellular memory at the current moment versus and

\bar{c_{t}}

which controls the amount of information flowing into the cell memory units. The cell memory unit

c_{t}

stores the amount of information about the cell at the current moment and can be updated at any time. The output gate

o_{t}

determines the amount of information flowing out at the current moment.

All emerged seedlings were phenotypically examined for a total of 788 sets of data, and seeding height and leaf area were used as the input information for LSTM, respectively. The structure of the LSTM prediction model is shown in Figure 9, and the data structure needs to be cleaned before training the LSTM network. The observed data set is first converted to the form of a supervised learning set, i.e., from a set of time-series data to the form of a data set with inputs and outputs. In this experiment, the time step is three and each data set consists of six data sets, three input data, and three output data, for a total of 13 data sets. Additionally, all the data sets are divided into 70% as the training set and 30% as the testing set. Finally, the data are normalized and standardized to make the gradient descent faster and the convergence more accurate. After training, data prediction and inverse normalization are performed to predict the future schedule-type data and obtain the predicted seeding height and leaf area.

The data acquisition work cycle was 6 days, with three sets of data collected per day, for a total of 18 sets of data. The prediction of growth using an LSTM neural network requires sufficient antecedent data to improve the prediction accuracy and practical needs in agronomic production. To ensure that replenishment decisions are available as close to the three days before the seedling stage as possible, allowing time for replenishment measures to keep the factory seedlings growing as evenly as possible and to ensure their economy meant that the first three days of seedling growth data were chosen to be used for prediction. The parameters of the LSTM network structure are shown in Table 2. The data on days t, t − 1, and t − 2 were used to predict the data on days t + 1, t + 2, and t + 3 (t = 3) as the output, and the seedlings’ strengths and weaknesses were discriminated by the subsequent discriminant method.

2.6. Machine Learning-Based Weak Seedling Discrimination Model

In the seedling period, there are only two categories to describe the strength and weaknesses of seedlings, so the strong seedling model in this study is actually a binary classification problem with supervised learning. To make the discriminations of strong and weak seedlings predictable, the data predicted in the previous step were used as the input phenotypic features. The predicted phenotype data of all emerged seedlings were cleaned with a total of 788 sets of data, and the ratio of the training and testing sets were uniformly divided into 70% and 30%.

In this study, logistic regression, support vector machine (SVM), random forest, and the boosting algorithms GBDT, XGBoost, and LightGBM were used to build classification prediction models, and the optimal prediction model was selected based on accuracy, recall, precision, and F1 Score to achieve the strong and weak seedling discrimination of watermelon seedlings.

For the dichotomous classification problem, there exists a situation analysis table summarizing the predicted results of the classification model, called the confusion matrix, as in Table 3.

Each parameter in the confusion matrix is TP (True Positive): predicting positive classes to positive classes; FN (False Negative): predicting positive classes to negative classes; FP (False Positive): predicting negative classes to positive classes; TN (True Negative): predicting negative classes to negative classes. The formula for calculating the evaluation indicators is as follows:

{\begin{matrix} a c c u r a c y = \frac{TP + TN}{TP + FP + FN + TN} \\ p r e c i s i o n = \frac{TP}{TP + FP} \\ r e c a l l = \frac{TP}{TP + FN} \end{matrix}

(6)

3. Results

3.1. Analysis of Seedling Positioning and Indexing Results

The correct hole positioning rate for the 16 plug seeding was counted at 9:00 on the first day, 9:00 on the second day, and 9:00 on the third day, as shown in Table 4, and the boundary of the first plug seeding was visualized; the detection effect is shown in Figure 10. The detected boundary matches the actual boundary of the plug, and the requirement of positioning seedlings in a single plug hole can be achieved. Each hole is labeled using a row-column naming rule, which provides information on the location of the hole for plug seedling segmentation.

Some of the seedlings were selected to correct the images. Figure 11a shows the image before correction, where most of the seedlings are not in the central position of the hole, resulting in a failure to obtain the accurate positioning of the seedlings. Figure 11b shows the corrected effect, which to a certain extent, compensates for the shortcomings of skewed seedling growth and can accurately locate the specific position of each seedling, realizing the segmentation and positioning of a single seedling, thus greatly improving the accuracy of the phenotyping measurement.

3.2. Analysis of Seedling Phenotype Test Results

The phenotypic data were measured manually on the third day of the first plug seeding. Seeding height was measured manually with a straightedge, and the leaf area was scanned and measured using an Epson Expression 12000XL scanner (Epson) by spreading the hand-picked leaves. The phenotypic parameters were obtained by the phenotypic measurement algorithm proposed in this paper, and the results were compared and analyzed with manual measurements. The coefficient of determination (R Squared), root mean square error (RMSE), and mean absolute percentage error (MAPE) were used to measure the accuracy of the method in this paper, as shown in Equation (7).

{\begin{matrix} R^{2} = \frac{\sum_{i = 1}^{n} (f_{i} - \bar{f}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}} \sqrt{\sum_{i = 1}^{n} {(f_{i} - \bar{f})}^{2}}} \\ M A P E = \frac{1}{n} \sum_{i = 1}^{n} | \frac{f_{i} - y_{i}}{f_{i}} | \\ E M S E = \sqrt{\frac{1}{n}} \sum_{i = 1}^{n} {(f_{i} - y_{i})}^{2} \end{matrix}

(7)

In Equation (7),

f_{i}

denotes the manual measurement value,

\bar{f}

denotes the average value of manual measurement,

y_{i}

denotes the plant height algorithm measurement value,

\bar{y}

denotes the average value of plant height algorithm measurement value, and i denote the

i

th group of data.

3.2.1. Seeding Height Detection

Figure 12 shows a comparison between the manual and algorithmic measurements of the seeding height of 50 seedlings selected from the first plug seeding. The R squared of the seeding height measured by the seeding height measurement method in this paper was 0.901, RMSE was

1.428 \times 10^{- 3}

m, and MAPE was 2.59%. In the seeding height measurement, the 3D measurement and manual measurement are mostly consistent, but there are some seedlings with large deviations, mainly because the point cloud in the soil part is not completely removed when the background is removed by point cloud filtering, thus leading to deviations.

3.2.2. Leaf Area Detection

Figure 13 shows a comparison between the manual and algorithmic measurements of the leaf area of 50 seedlings selected from the first plug seeding. The R squared of the leaf area measured by the leaf area measurement method in this paper was 0.922, RMSE was

4.399 \times 10^{- 5}

m², and MAPE was 7.23%. Seedlings with deviations in 3D leaf area measurements were mainly found in the margins of camera shots. With TOF camera imaging, the closer the shooting object was to the edge of the camera, the more serious distortion occurred in the picture, and the point cloud of the leaf was missing seriously, resulting in the leaf area measurement being generally smaller than the real area.

3.3. Network Model Fitting Effect

3.3.1. LSTM Performance Evaluation Metrics

Regression analysis is a statistical analysis method to determine the quantitative interdependent relationship between two or more variables, which can reveal the degree of correlation between phenomena. In this experiment, the performance of LSTM is evaluated by selecting the one-dimensional linear regression analysis method. The true value was the independent variable, and the predicted value was the dependent variable. A random sample of 50 seedings was selected for linear regression analysis. Table 5 shows the evaluation indicators of the LSTM predicted plant height and leaf area using regression analysis, including the slope, intercept, Pearson’s r and R Squared at t + 1, t + 2 and t + 3. Figure 14 shows the linear regression analysis of the true values of seeding height and leaf area of the 50 samples with the LSTM prediction at moments t + 1, t + 2, and t + 3, respectively. Figure 14 shows the slope and intercept of the regression equation, Pearson’s r, the R squared, and the coloring area is the 95% confidence band and the 95% prediction band.

3.3.2. Performance Analysis of the Weak Seedling Discriminant Model

The accuracy, precision, recall, and F1 score of each model in the test set are shown in Table 6.

In the process of phenotype data detection, skewing, leaf wilting, and other reasons cause the phenomenon of outliers, and the data is non-linear. The logistic regression and SVM for the non-linear data classification ability are more general, and the accuracy rate is lower. The gradient-boosting decision tree can handle multi-dimensional and multi-feature problems, but the performance is more general for data samples with more outliers. The random forest is insensitive to outliers and performs better with an accuracy of 0.84. The random forest classification model is selected to better discriminate the degree of strength and weakness for the seedlings.

Since the data of classification samples are predicted by LSTM, errors are generated at the prediction stage, which will be superimposed with the errors of the classification model and increase the probability of misclassification. In the prediction stage, the prediction step is positively correlated with prediction accuracy. In this study, the prediction step is set to three days in order to balance the relationship between the actual production and prediction accuracy. In the classification stage, the probability of misclassification can be reduced by increasing the sample size to improve the fit of the model.

3.4. Weak Seedling Discrimination Results

The watermelon plug seedlings are subjected to continuous phenotype detection, and the detected data are used as input values for the growth prediction model to obtain future phenotype data, and then the phenotype data are input into the established random forest binary classification model to obtain the future growth status of single seedlings, and the growth status data can be indexed to the specific location of each seedling according to the positioning of plug seedlings, i.e., the culling and replenishment decision for the plug seedlings. Figure 15 shows the growth of seedlings on the third and sixth days and the location of abnormal seedlings.

4. Conclusions

To address the problem of the poor early identification of weak seedlings in factory nurseries, this paper proposes a visual system for the early discrimination of weak watermelon seedlings based on phenotype detection and machine learning, which uses two early characteristics of the seedling height and leaf area to assess their growth status, mainly including the following aspects:

First item. The color information and depth information of seedlings were obtained using an RGB-D camera, and the seeding height and leaf area characteristics of the seedlings were obtained based on a traditional image segmentation algorithm and 3D point cloud processing method, and their MPAE were 2.59% and 7.23%, respectively, indicating that the method has high reliability, and the consumer-grade camera Azure Kinect DK is low-cost, simple to operate, stable, and reliable.

Second item. The two features are fed into LSTM for prediction, and then the predicted information is fed into the random forest classification network to build a weak seedling early discrimination model. The model achieved 84% discrimination accuracy on the test set of early discrimination for the weak seedlings of watermelon seedlings, which can realize the early prediction of weak seedlings, provide visual support for seedling dividing and combining trays and seedling replenishment robots, realize the regulation and early warning for seedling factories, and greatly improve the seedling economy of factories, and has good development potential.

This study can be used as a vision system for the seedling dividing and combining robot. In the seedling production, Kinect is installed on the seedling dividing and combining robot, and the growth condition of the whole seedlings plug is obtained by monitoring for three consecutive days, and the position of each seedling is matched with the growth condition and fed to the robot, and the robot uses the robot arm to transplant and replenishes seedlings according to the position of the weak seedlings to realize the regulation and early warning of seedling plant production.

In conclusion, this paper presents a creative solution for seedling monitoring and weak seedling prediction while combining traditional image processing and AI machine learning methods, which is a useful example of digital research in a factory nursery and can effectively promote the degree of automation and intelligence in factory nurseries.

Author Contributions

S.X.: Conceptualization, Investigation, Methodology, Visualization, Writing—review and editing. Y.Z.: Conceptualization, Investigation, Methodology, Visualization, Writing—original draft. W.D.: Software. Z.B.: Resources, Data curation, C.P.: Validation. Y.H.: Funding acquisition, Supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Key Research and Development Program of China (grant number 2019YFD1001900); the HZAU-AGIS Cooperation Fund (grant number SZYJY2022006); and the Hubei provincial key research and development program (grant number 2021BBA239).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, H.; He, J.; Aziz, N.; Wang, Y. Spatial Distribution and Driving Forces of the Vegetable Industry in China. Land 2022, 11, 981. [Google Scholar] [CrossRef]
Uchiyama, H.; Sakurai, S.; Mishima, M.; Arita, D.; Okayasu, T.; Shimada, A.; Taniguchi, R.I. An Easy-to-Setup 3D Phenotyping Platform for KOMATSUNA Dataset. In Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, Italy, 22–29 October 2017; pp. 2038–2045. [Google Scholar]
Wen, Y.; Zhang, L.; Huang, X.; Yuan, T.; Zhang, J.; Tan, Y.; Feng, Z. Design of and Experiment with Seedling Selection System for Automatic Transplanter for Vegetable Plug Seedlings. Agronomy 2021, 11, 2031. [Google Scholar] [CrossRef]
Tong, J.; Qiu, Z.; Zhou, H.; Khawar Bashir, M.; Yu, G.; Wu, C.; Du, X. Optimizing the path of seedling transplanting with multi-end effectors by using an improved greedy annealing algorithm. Comput. Electron. Agric. 2022, 201, 107276. [Google Scholar] [CrossRef]
Zhao, C. Big Data of Plant Phenomics and Its Research Progress. J. Agric. Big Data 2019, 1, 5–14. [Google Scholar]
Liu, H.; Pan, C.; Shen, Y.; Gao, B. Plant Point Cloud Information Fusion Method Based on SICK and Kinect Sensors. Trans. Chin. Soc. Agric. Mach. 2018, 49, 284–291. [Google Scholar]
Huang, C.; Li, Y.; Luo, S.; Yang, W.; Zhu, L. Cotton Seedling Leaf Traits Extraction Method from 3D Point Cloud Based on Structured Light Imaging. Trans. Chin. Soc. Agric. Mach. 2019, 50, 243–248+288. [Google Scholar]
Hu, Y.; Wang, L.; Xiang, L.; Wu, Q.; Jiang, H. Automatic Non-Destructive Growth Measurement of Leafy Vegetables Based on Kinect. Sensors 2018, 18, 806. [Google Scholar] [CrossRef] [Green Version]
Yang, W.; Feng, H.; Zhang, X.; Zhang, J.; Doonan, J.H.; Batchelor, W.D.; Xiong, L.; Yan, J. Crop Phenomics and High-Throughput Phenotyping: Past Decades, Current Challenges, and Future Perspectives. Mol. Plant 2020, 13, 187–214. [Google Scholar] [CrossRef] [Green Version]
Gill, T.; Gill, S.; Chopra, Y.; Koff, J.; Saini, D.; Sandhu, K. A Comprehensive Review of High Throughput Phenotyping and Machine Learning for Plant Stress Phenotyping. Plant Phenomics 2022, 2, 3. [Google Scholar] [CrossRef]
Jin, X.; Li, R.; Tang, Q.; Wu, J.; Jiang, L.; Wu, C. Low-damage transplanting method for leafy vegetable seedlings based on machine vision. Biosyst. Eng. 2022, 220, 159–171. [Google Scholar] [CrossRef]
Yang, S.; Gao, W.; Mi, J.; Wu, M.; Wang, M.; Zheng, L. Method for Measurement of Vegetable Seedlings Height Based on RGB-D Camera. Trans. Chin. Soc. Agric. Mach. 2019, 50, 128–135. [Google Scholar]
Teng, X.; Zhou, G.; Wu, Y.; Huang, C.; Dong, W.; Xu, S. Three-Dimensional Reconstruction Method of Rapeseed Plants in the Whole Growth Period Using RGB-D Camera. Sensors 2021, 21, 4628. [Google Scholar] [CrossRef]
Otoya, P.E.L.; Gardini, S.R.P. A Machine Vision System based on RGB-D Image Analysis for the Artichoke Seedling Grading Automation According to Leaf Area. In Proceedings of the 2021 IEEE 3rd Eurasia Conference on IOT, Communication and Engineering (ECICE), Yunlin, Taiwan, 29–31 October 2021; pp. 176–181. [Google Scholar]
Nguyen, T.T.; Slaughter, D.C.; Max, N.; Maloof, J.N.; Sinha, N. Structured Light-Based 3D Reconstruction System for Plants. Sensors 2015, 15, 18587–18612. [Google Scholar] [CrossRef] [Green Version]
Chen, H.; Wamg, T.; Dai, Z.; Huang, Y. 3D Measurement Method for Area of Irregular Plant Leaf Based on Structure from Motion. Trans. Chin. Soc. Agric. Mach. 2021, 52, 230–238. [Google Scholar]
Wang, J.; Gu, R.; Sun, L.; Zhang, Y. Non-destructive Monitoring of Plug Seedling Growth Process Based on Kinect Camera. Trans. Chin. Soc. Agric. Mach. 2021, 52, 227–235. [Google Scholar]
Zhang, L.; Tan, Y.; Jiang, Y.; Wang, S. Automatic Detection Method for Late Emergence Seedlings in Plug Trays Based on Point Cloud Processing. Trans. Chin. Soc. Agric. Mach. 2022, 53, 261–269. [Google Scholar]
Zhang, H.; Wang, G.; Bian, L.; Zheng, J.; Zhou, H. Visible Camera based 3D Phenotype Measurement System and Time series Visual Growth Model of Plant. Trans. Chin. Soc. Agric. Mach. 2019, 50, 197–207. [Google Scholar]
An, N.; Palmer, C.M.; Baker, R.L.; Markelz, R.C.; Ta, J.; Covington, M.F.; Maloof, J.N.; Welch, S.M.; Weinig, C. Plant high-throughput phenotyping using photogrammetry and imaging techniques to measure leaf length and rosette area. Comput. Electron. Agric. 2016, 127, 376–394. [Google Scholar] [CrossRef]
Yang, S.; Zheng, L.; Gao, W.; Wang, B.; Hao, X.; Mi, J.; Wang, M. An Efficient Processing Approach for Colored Point Cloud-Based High-Throughput Seedling Phenotyping. Remote Sens. 2020, 12, 1540. [Google Scholar] [CrossRef]
Mahanti, N.K.; Konga, U.; Chakraborty, S.K.; Babu, V.B. Non-destructive Estimation of Spinach Leaf Area: Image Processing and Artificial Neural Network Based Approach. Curr. J. Appl. Sci. Technol. 2020, 146–153. [Google Scholar] [CrossRef]
Gopnarayan, A.; Deshpande, S. Survey of Prediction Using Recurrent Neural Network with Long Short-Term Memory. Int. J. Sci. Res. 2019, 8, 9–11. [Google Scholar] [CrossRef]
Hua, S.; Xu, M.; Xu, Z.; Ye, H.; Zhou, C. Kinect-Based Real-Time Acquisition Algorithm of Crop Growth Depth Images. Math. Probl. Eng. 2021, 2021, 3913575. [Google Scholar] [CrossRef]

Figure 1. Plug seeding image data acquisition platform.

Figure 2. Continuous acquisition of color images with depth images.

Figure 3. Technical method flow chart.

Figure 4. Plug-hole identification process.

Figure 5. The skewed seedling correction process.

Figure 6. Plant height detection schematic.

Figure 7. Leaf area detection schematic. (a) Point cloud map containing color information. (b) Greedy projection triangulation.

Figure 8. LSTM neural unit structure.

Figure 9. Watermelon seedling growth time series LSTM prediction model.

Figure 10. Effect of plug hole positioning for plug seedlings.

Figure 11. Skewed seedling correction before and after effect. (a) Image before correction of the skewed seedling. (b) Image after correction of the skewed seedling.

Figure 12. Comparison of manual and algorithmic measurements of the height of the first plug seeding.

Figure 13. Comparison of manual and algorithmic measurements of the area of the first plug seeding.

Figure 14. Regression analysis of LSTM time series prediction.

Figure 15. Abnormal seedling detection results. (a) Day 3 abnormal seedling detection. (b) Day 6 abnormal seedling detection.

Table 1. Sample data statistics.

Watermelon Seedling Varieties	Total Number of Cavities	Total Seedling Emergence Sample	Seedling Emergence Rate	Normal Seedlings	Abnormal Seedlings	Normal Seedlings’ Emergence Rate
Zaojia84-24	800	788	98.5%	542	246	67.8%

Table 2. LSTM Network structure parameters.

Parameters	Specification
Training	70%
Testing	30%
Input gate	Sigmoid
Forget gate	Sigmoid
Output gate	Sigmoid
Hidden layer	Tanh
Number of layers	1
Number of Hidden units	100
Input size	3
Time step	3
Loss function	Cross-entropy
Epoch	300
Batch size	8

Table 3. Confusion matrix for binary classification.

	Actual Positive	Actual Negative
Predicted positive	TP	FP
Predicted negative	FN	TN

Table 4. The recognition accuracy of the plug hole.

Data Acquisition Time	Total Number of Plug Seeding	Identification Accuracy Number	Identification Accuracy
Day 1—9:00	16	15	93.75%
Day 2—9:00	16	14	87.5%
Day 3—9:00	16	14	87.5%

Table 5. Evaluation indicators of LSTM time series prediction.

Phenotype	Evaluation Indicators	t + 1	t + 2	t + 3
Plant height	Slope	1.002	1.053	1.063
	Intercept	$- 1.27 \times 10^{- 3}$	$- 4.67 \times 10^{- 3}$	$- 4.98 \times 10^{- 3}$
	Pearson’s r	0.972	0.971	0.965
	R Squared	0.952	0.942	0.932
Leaf area	Slope	0.925	1.003	0.820
	Intercept	$- 7.31 \times 10^{- 7}$	$- 4.90 \times 10^{- 5}$	$1.04 \times 10^{- 4}$
	Pearson’s r	0.963	0.959	0.952
	R Squared	0.926	0.919	0.901

Table 6. Evaluation results of each classification model.

Classification Model	Accuracy	Recall	Precision	F1 Score	TP	FP
Logistics Regression	0.748	0.796	0.901	0.845	164	18
SVM	0.765	0.765	1	0.867	182	0
Random Forest	0.840	0.883	0.912	0.897	166	16
GBDT	0.782	0.865	0.846	0.856	154	28
XGBoost	0.798	0.876	0.857	0.867	156	26
LightGBM	0.824	0.880	0.890	0.885	162	20

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, S.; Zhang, Y.; Dong, W.; Bie, Z.; Peng, C.; Huang, Y. Early Identification and Localization Algorithm for Weak Seedlings Based on Phenotype Detection and Machine Learning. Agriculture 2023, 13, 212. https://doi.org/10.3390/agriculture13010212

AMA Style

Xu S, Zhang Y, Dong W, Bie Z, Peng C, Huang Y. Early Identification and Localization Algorithm for Weak Seedlings Based on Phenotype Detection and Machine Learning. Agriculture. 2023; 13(1):212. https://doi.org/10.3390/agriculture13010212

Chicago/Turabian Style

Xu, Shengyong, Yi Zhang, Wanjing Dong, Zhilong Bie, Chengli Peng, and Yuan Huang. 2023. "Early Identification and Localization Algorithm for Weak Seedlings Based on Phenotype Detection and Machine Learning" Agriculture 13, no. 1: 212. https://doi.org/10.3390/agriculture13010212

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Early Identification and Localization Algorithm for Weak Seedlings Based on Phenotype Detection and Machine Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Materials and Data Acquisition

2.2. Overall Flow Chart

2.3. Seedling Positioning and Indexing Methods in Cavity Trays

2.3.1. Plug-Hole Location and Indexing

2.3.2. Seedling Index

2.4. Seedling Phenotype Detection Algorithm

2.4.1. Seeding Height Measurement

2.4.2. Leaf Area Measurement

2.5. LSTM-Based Phenotype Prediction Model for Seedling

2.6. Machine Learning-Based Weak Seedling Discrimination Model

3. Results

3.1. Analysis of Seedling Positioning and Indexing Results

3.2. Analysis of Seedling Phenotype Test Results

3.2.1. Seeding Height Detection

3.2.2. Leaf Area Detection

3.3. Network Model Fitting Effect

3.3.1. LSTM Performance Evaluation Metrics

3.3.2. Performance Analysis of the Weak Seedling Discriminant Model

3.4. Weak Seedling Discrimination Results

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI