Efficient Non-Destructive Detection for External Defects of Kiwifruit

Wang, Feiyun; Lv, Chengxu; Pan, Yuxuan; Zhou, Liming; Zhao, Bo

doi:10.3390/app132111971

Open AccessArticle

Efficient Non-Destructive Detection for External Defects of Kiwifruit

National Key Laboratory of Agricultural Equipment Technology, Chinese Academy of Agricultural Mechanization Sciences Group Co., Ltd., Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(21), 11971; https://doi.org/10.3390/app132111971

Submission received: 1 September 2023 / Revised: 19 October 2023 / Accepted: 20 October 2023 / Published: 2 November 2023

(This article belongs to the Section Agricultural Science and Technology)

Download

Browse Figures

Versions Notes

Abstract

:

External defects of kiwifruit seriously affect its added commercialization. To address the existing problems, kiwifruit external defects detection has a few methods for detecting multi-category defects and weak adaptability to complex images. In this study, we proposed ResNet combined with CBAM for the automatic detection of external defects in kiwifruit. The experiment first built an acquisition device to obtain high-quality images. The optimal fusion scheme of ResNet and CBAM was investigated, the network training parameters were optimized, and Adam was used to accelerate the convergence speed of the model. It was found that the average recognition accuracy of ResNet34 + CBAM for kiwifruit was 99.6%, and all evaluation metrics were greater than 99%. Meanwhile, the experiment selected AlexNet, VGG16, InceptionV3, ResNet34, and ResNet34 + CBAM for comparison. The results showed that the recognition accuracy of ResNet34 + CBAM was 7.9%, 12.7%, 11.8%, and 4.3% higher than that of AlexNet, VGG16, InceptionV3, and ResNet34, respectively. Therefore, it can be concluded that ResNet34 + CBAM has the advantages of high recognition accuracy and good stability for kiwifruit external defect sample detection. It provides a technical guarantee for online detection and sorting of kiwifruit and other fruit defects.

Keywords:

convolutional neural network; kiwifruit; defects; attention mechanism; non-destructive detection

1. Introduction

Kiwifruit is sweet and sour, rich in nutrients, and known as the “king of fruits” [1]. China’s kiwifruit is the world’s largest in terms of cultivated area and production, and it is also a major pillar of China’s fruit industry [2]. Currently, kiwifruit quality sorting in China’s major kiwifruit producing areas is mainly manual, resulting in inconsistent sorting indexes, low sorting efficiency, and weak kiwifruit competitiveness. Moreover, kiwifruit is highly susceptible to external damage during growth, harvesting, and transportation, including sunburn, bruises, and scratches. These external defects not only affect the commercialization value of kiwifruit but can also be transmitted to healthy kiwifruit, causing unwanted losses. Therefore, it is necessary to adopt new technology to accurately and efficiently detect and sort kiwifruit to enhance the commodity value of kiwifruit.

The external qualities of kiwifruit are assessed through traditional image processing methods more commonly. Shao (2009) et al. [3] used threshold segmentation to detect the size, shape, and color of kiwifruit, but they did not address the external defect content. Zhou (2012) et al. [4] selected YCbCr and HSV colors for the detection of damaged and scarred kiwifruit, respectively, with an accuracy of 88%. Li used a near-infrared light source to extract scratched kiwifruit. The function relationship between the intensity of the light source and the external defects was established to achieve the detection of kiwifruit defects [5]. Li first processed the images using morphology and then used a BP neural network to categorize the external defects of kiwifruit into four classes, with an accuracy of more than 91.3% [6]. To summarize, traditional image processing techniques are available for specific recognition, but such methods have low recognition accuracy and poor stability for different samples.

CNN (convolutional neural network) [7] has been widely studied and applied in fruit quality detection and grading because of its adaptability, high robustness, and excellent feature expression ability compared to traditional methods [8]. Fan (2020) et al. [9] improved the traditional CNN for apple defect detection by reducing the parameters and the number of connections of the model with an experimental accuracy of 92%. An (2022) et al. [10] proposed a CNN-based citrus tracking algorithm, which realized the historical information tracking and defect detection of citrus, and the experimental results showed that the recognition accuracy of defective citrus was 92.8%. Cao (2009) [11] categorized plums into five grades (excellent, good with scars, second grade, second grade with scars, and rotten) and proposed an intelligent feedback cognitive algorithm based on ASCNNs and integrated RVFL (Random Vector Functional Link) classifiers, with an average recognition rate of 98.2%. Costa (2020) et al. [12] proposed a deep learning model based on ResNet50 (Residual Neural Network 50) for tomato external defect detection, which achieved an average accuracy of 94.6%. In addition, CNN has also been applied in the quality detection of fruits, such as balsam pear [13], papaya [14], jujube [15], watermelon [16], and grape [17]. CNN not only improves the sample recognition accuracy but also expands the range of sample category detection compared to traditional methods. This method has less research in the direction of kiwifruit surface defect detection. Yao (2021) et al. [18] optimized the original network of YOLOv5, and the optimized model achieved 94.7% accuracy in detecting external defects in kiwifruit. However, the experiment did not consider practical grading application scenarios and the model was not well adapted to some defective samples.

With the development of CNN, scholars have been able to acquire more information and features in images by deepening the depth of the network. However, as the network was deepened, the problems of gradient explosion, gradient disappearance, and network degradation would occur, causing the accuracy of the model to decrease rather than increase. To solve this problem, researchers applied SGD (Stochastic Gradient Descent) [19] during the backpropagation process of model training, making the network reach convergence. This method can solve the case of shallow network layers, but as the network layers continue to deepen, the gradient will become smaller and smaller as well, causing the gradient to vanish. Gradient vanish will lead to a situation where the network cannot converge and the network cannot learn effectively. Therefore, in order to solve the problem of disappearing gradient and network degradation of deep neural networks, He et al. proposed ResNet [20]. ResNet introduced the residual structure, which maintains the transfer of gradient information while the number of network layers increases. Currently, in other fields, ResNet has been better applied [21,22,23].

The attention mechanism is a technique used to enhance the attention of CNN to important feature information [24]. This technique obtains a weight vector for each position according to similarity between the output of the upper layer and the current input sequence; the higher the similarity, the higher the weight, and it passes the weight vector to the next layer. An increase in the attention paid to the region of interest in the sequence is achieved. Thus, the recognition performance of CNN can be improved. Currently, the attention mechanism is widely used in the fields of image recognition [25], semantic segmentation [26], target detection [27], etc., and it is evident that the technique is also applicable to the disease recognition of agricultural products.

Therefore, in this study, we built a kiwifruit image acquisition device based on a grading line and constructed a sample dataset. Automatic extraction of kiwifruit surface defect features was performed based on the ResNet model using the cosine annealing attenuation strategy and optimization of training parameters with Adam’s (Adaptive momentum) algorithm. The best fusion scheme of CBAM (Convolutional Block Attention Module) was preferred to achieve accurate and efficient recognition of kiwifruit.

2. Materials and Methods

The main steps of this study are data acquisition, preprocessing, model optimization, and feature recognition. The whole workflow is shown in Figure 1. Compared with the conventional method, the optimal fusion scheme of ResNet and CBAM is investigated to realize the efficient detection of kiwifruit defects. The fusion method is described in detail in Section 2.4.

2.1. Dataset Construction

2.1.1. Sample Source

The kiwifruit samples were obtained from Zhouzhi and Meixian County, Shaanxi Province. The varieties Xu Xiang and Cui Xiang kiwifruit were selected for study. Samples were obtained in multiple batches in the field and online from November 2021 to November 2022, with a total of 1020 original samples, including 320 healthy and 700 defective samples, as shown in Figure 2.

2.1.2. Image Acquisition

Image acquisition was performed using Shaanxi Visions Intelligent Technology Co., Ltd.’s (Xi’an, Shaanxi, China) MV-EM200C camera with type BT-23C0814MP5 industrial lens. The camera has a resolution of 1600 × 1200 pixels and a capture frame rate of 39.9 fps. The image acquisition device was built based on the grading line [28], which mainly included the camera, lens, shadow box, light source, and acrylic plate, as shown in Figure 3. To apply the actual grading application scenario, the height of the camera was adjusted to be 32 cm above the level of the pallet, so that three complete pallets of information could be captured in a single image. The reason for this arrangement is that the kiwifruit could be rotated by the pallet during its movement, and the information of the whole surface of the kiwifruit could be collected through the information of the three complete pallets. To improve the quality of the captured images, the light source was emitted from the bottom up and reflected on the surface of the kiwifruit after passing through a half-cylindrical acrylic plate, thus reducing the problem of uneven illumination and reflections at different locations caused by direct light. Adjusting the grading line speed to 3–5 per second, the pallet information was collected by the counter sensor, which was passed to the isolation board, thus driving the camera to trigger synchronously, and the acquired images contained 1–3 samples, for a total of 2220 images.

2.1.3. Data Processing

The goal of the experiment was to code a Python script that was used to implement an image cut such that a single image contained only one kiwifruit sample and the image size became 533 × 1200. Then, the border redundant information was removed through image cropping, and the image was resized to 224 × 224 through spline interpolation. A total of 4663 images were obtained, of which 1731 were healthy and 2932 were defective. The two categories of images were divided into a training set, a validation set, and a test set according to different batches in the ratio of 8:1:1, and the results of the division are shown in Table 1.

2.1.4. Data Enhancement

Image enhancement can improve the quality and information of the original data before processing. In this study, we proposed to enhance the quality and information of the original data by flipping the images horizontally. And, normalization was performed as shown in Equation (1).

f (x) = \frac{x - μ_{}}{σ}

(1)

where

x

is the input value,

μ

is the mean value, and

σ

is the standard deviation.

2.2. Model Construction

2.2.1. Experimental Conditions

The experimental operating platform was a DELL Precision 7920 Tower workstation with Ubuntu 18.04 64-bit operating system. The workstation has an Intel Xeon Silver 4216 @ 2.10 GHz (X2) CPU, 128 GB of RAM, and an NVIDIA GeForce RTX 3090 graphics processing unit with 24 G of video memory. Pytorch version 1.11, a dynamic neural network that supports GPU acceleration, was used as the deep learning framework.

2.2.2. Model Analytics

The basic structural unit of ResNet is shown in Figure 4. Its basic idea is that when new network layers are continuously added, but the added layers do not learn any information, then only the shallow network features are used as the output, i.e., the newly added network layers are the equivalent mapping of the shallow network. The residual unit is divided into two branches. The main branch is the residual mapping

F (x)

, including two weight layers with Relu activation functions. The sub-branch is the feature mapping

x

of the output of the previous layer. Finally, the output mapping is

F (x) + x

. When the network is very deep, the problem of gradient disappearance and network degradation occurs, and then it is only necessary to adjust the residual mapping

F (x)

to 0, for which the best output is the previous layer output

x

. The increasing depth of the network can be accomplished through this structure, which allows the network to learn more information, thus improving the accuracy and robustness of the network.

2.2.3. Model Structure

There are two residual structures in ResNet, as shown in Figure 5a,b. Residual_1 has two 3 × 3 convolutions with the same number of output channels. In Residual_2, the 256 channel dimensions are first reduced to 64 using a 1 × 1 convolution kernel, and then they are passed through an intermediate layer of convolution kernel 3 × 3. Finally, the channel dimension is raised from 64 to 256 using a 1 × 1 convolution kernel. Using a 1 × 1 convolution kernel to reduce the channel dimension and an intermediate-layer 3 × 3 convolution kernel to convolve at a low channel dimension reduces the parameter computation of the convolution layer, and then the channel dimension is elevated by a 1 × 1 convolution kernel. This maintains the recognition accuracy of the model and reduces the amount of parameter computation, saving time and cost.

ResNet has several models with different depths. In this study, we chose ResNet34 for the experiment, as shown in Table 2. ResNet34 consists of five convolutional groups with the final output layer. The input kiwifruit image size is 224 × 224. AfterConv1, the output size is 112 × 112. Next, it passes through the maximum downsampling layer with step size 2 and convolution kernel 3 × 3 with Conv2_x (3 Residual_1), and the output size is 56 × 56. The three group convolutions (Conv3_x, Conv4_x, and Conv5_x) are all first doubled in channel dimensions by the Residual_2 structure, followed by connecting 3, 5, and 2 Residual_1 pair for feature extraction, respectively. Finally, after average downsampling to reduce the dimension to 1 × 1, the fully connected layer with the Softmax activation function is used to output the class as healthy or defective kiwifruit results.

The Relu activation function for the output of the convolutional layer of the network structure is Equation (2):

f (x) = \max (0, x)

(2)

As can be seen from the formula, the basic principle of the Relu activation function is to take the maximum value of the function and replace it with 0 when the function value is negative.

The Softmax formulation of the activation function used for the fully connected layer is Equation (3):

S i = \frac{e^{a_{i}}}{\sum_{i = 1}^{n} e^{a_{k}}}

(3)

where

a_{i}

is the output value of the

i

node and

n

is the total number of output nodes.

The Softmax function converts the results of multiple classifications to values in the range of (0, 1) and outputs the results with this probability.

2.3. Feature Attention Enhancement

When using CNN to learn image features, the more parameters there are, the more powerful the representation of the model. The larger amount of information stored may lead to information overload. To achieve feature extraction focusing on more important information, an attention mechanism is introduced in the model. The attention mechanism realizes the automatic extraction of important features through the feature mapping of the model. This improves the stability and accuracy of the model.

CBAM [29] is an attention mechanism that combines channel attention and spatial attention. As shown in Figure 6, kiwifruit feature maps are used as inputs to CBAM, which first undergoes channel attention to obtain the weights of each channel based on the similarity of the features in the feature maps, and the weights are weighted by a multiplier into the input feature layer. Spatial attention is the complement of channel attention, and spatial attention will be the input feature to map more important region information extraction; to obtain the feature weights, the same weights are input through the multiplier weighting into the input feature layer, and, finally, the output feature map is obtained.

2.4. Model Integration

The experiment of fusing CBAM into ResNet34 was used to improve the stability and accuracy of the model. ResNet34 outputs more and more channels of feature maps, and the size of the feature maps gets smaller and smaller during the process of convolution of kiwifruit images. Therefore, the kiwifruit feature weight values obtained by fusing CBAM at different positions in the ResNet34 network structure are different. To investigate the effect of fusing CBAM modules at different locations in the ResNet34 network structure on the results, the experiment compared five different fusion schemes, as shown in Table 3. (1) Fusing one CBAM after Conv1 in ResNet34, (2) fusing one CBAM after Conv3_x in ResNet34, (3) fusing one CBAM after Conv1 and one CBAM after Conv3_x in ResNet34, (4) fusing one CBAM after Conv3_x and one CBAM after Conv5_x in ResNet34, and (5) fusing one CBAM each after Conv1 and after Conv5_x in ResNet34. The difference between fusing CBAM attention at different locations in ResNet34 is that the number of channels at different locations varies, and the size of the feature maps varies, thus affecting the model’s attention to the features of interest. The experiments were conducted by comparing the loss values of the five fusion schemes in relation to the kiwifruits training set, and the best fusion scheme was preferred as the experimental model for this study.

2.5. Evaluation Indicators

To evaluate the recognition accuracy of the model, five metrics were selected for evaluation in this study.

A_{c c} = \frac{T_{P} + T_{N}}{T_{P} + F_{N} + T_{N} + F_{P}}

(4)

P = \frac{T_{P}}{T_{P} + F_{P}}

(5)

R = \frac{T_{P}}{T_{P} + F_{N}}

(6)

S = \frac{T_{N}}{T_{N} + F_{P}}

(7)

F 1_{s c o r e} = 2 \times \frac{R \times P}{R + P}

(8)

where

T_{P}

denotes the number of kiwifruits predicted to be defective and the number of kiwifruits that are also defective,

T_{N}

denotes the number of kiwifruits predicted to be healthy and the number of kiwifruits that are also healthy,

F_{P}

denotes the number of kiwifruits predicted to be defective that are also healthy,

F_{N}

denotes the number of kiwifruits predicted to be healthy that are also defective,

A_{c c}

is the accuracy,

P

is the checking accuracy,

R

is the checking completeness,

S

is the specificity, and

F 1_{s c o r e}

is the reconciled mean.

2.6. Model Training

The experiment adopted the Adam optimization algorithm, and the batch size of training and validation was set to 32. The learning rate was selected through the batch value, as shown in Equation (9), and the initial learning rate of 0.0125 could be obtained through calculation. The overfitting and underfitting problems could be avoided by adopting a cosine-annealing decay strategy [30], as in Equation (10), where the learning rate was decayed through a linear function and the learning rate was automatically adjusted during model training. The number of iterations was set to 150. The training loss was calculated by the cross-entropy loss function, as shown in Equation (11).

l_{r} = 0.1 \times \frac{b}{256}

(9)

l_{f} = ((1 + \cos (\frac{t \times π}{T})) / 2) \times (1 - l_{r f}) + l_{r f}

(10)

L = - [y_{i} \log p_{i} + (1 - y_{i}) \log (1 - p_{i})]

(11)

where

b

is the batch size,

l_{r}

is the learning rate,

t

is the current number of rounds,

T

is the total number of rounds,

l_{r f}

is the decay factor,

y_{i}

is the sample true label, and

p_{i}

is the sample prediction.

3. Experiments and Results

3.1. Model Fusion Program Training Results

The five fusion scheme models were trained separately, and the loss value of each fusion scheme model in relation to the kiwifruit training set was used as the evaluation index. As shown in Table 4, the loss values of the fusion scheme’s model 1 to 5 training sets are 0.18, 0.20, 0.15, 0.13, and 0.11, respectively. From the magnitude of the loss values, it can be concluded that fusing two CBAMs in ResNet34 outperforms fusing a single module. And, when ResNet34 fused two modules, scheme 5 is better than scheme 3 and scheme 4, i.e., the loss value of fusing CBAM after Conv1 and after Conv5_x is the smallest. Therefore, scheme 5 was chosen as the best fusion model for this experiment, and it was named “ResNet34 + CBAM” to facilitate the subsequent comparison and analysis of the results.

3.2. Model Validation

The recognition accuracy of ResNet34 and ResNet34 + CBAM in relation to the kiwifruit validation set is shown in Figure 7. It can be found that ResNet34 fluctuates more in accuracy in the range of 0–90 rounds, and, after 90 rounds, the fluctuation decreases and stabilizes. For ResNet34 + CBAM, in the range of rounds 0–30, the model has effectively learned the features and the recognition accuracy is, overall, in an upward trend, and, in the range of rounds 0–10, the rise is most significant. After 30 rounds, the recognition accuracy fluctuates up and down in a small range. Finally, after 150 rounds, the recognition accuracy of ResNet34 and ResNet34 + CBAM for the validation set basically stabilizes. The accuracy for the validation set can be more than 95%. By comparing the change curve of accuracy, it can be concluded that ResNet34 + CBAM has better stability than ResNet34.

The loss values of ResNet34 and ResNet34 + CBAM in relation to the kiwifruit validation set are shown in Figure 8. It can be concluded that the loss value of ResNet34 + CBAM for the kiwifruit validation set is better than that of ResNet34. This is mainly reflected in the fact that as the number of iteration rounds increases, the loss value of ResNet34 + CBAM decreases gradually. In the range of 0–10 rounds, the loss value decreases at the greatest speed, and, after 120 rounds, the fluctuation of the loss value gradually decreases and stabilizes, and the final loss value is 0.12. On the other hand, the fluctuation of the loss value of ResNet34 is larger in the process of iteration, and the final loss value stabilizes at 0.18 after 150 rounds of iteration.

3.3. Results Testing

The experiment preserved ResNet34 + CBAM with ResNet34 optimal weights for validation of the test set. There are healthy (173) and defective (294) kiwifruit in the test set. As shown in Table 5, ResNet34 + CBAM was chosen to test the kiwifruit, and all 173 healthy kiwifruit were correctly recognized, with 2 misclassifications out of 294 defective kiwifruit. The average recognition accuracy of the kiwifruit was 99.6%. ResNet34 was chosen for the test, with 6 misclassifications for healthy kiwifruit and 15 misclassifications for defective kiwifruit. The average accuracy was 95.5%. It can be concluded that the incorporation of CBAM in ResNet34 improved the model’s ability to recognize kiwifruit.

To further validate the performance of ResNet34 + CBAM, this study used Grad_Cam to visualize the heat map of ResNet34 + CBAM in the kiwifruit test set, as shown in Figure 9. By comparing the difference between healthy and defective kiwifruit in the heat map, it can be found that ResNet34 + CBAM focuses on a larger range of features in healthy kiwifruit, while the focus on defective kiwifruit focuses on the defective region. It can also be seen that ResNet34 + CBAM localizes the features of defective kiwifruit regions accurately.

3.4. Model Evaluation

The values of ResNet34 + CBAM in each evaluation index are shown in Table 6. The P, R, S, and F1score are 100.0%, 99.3%, 100.0%, and 99.6%, respectively. It can be concluded that ResNet34 + CBAM is capable of recognizing healthy and defective kiwifruit. The model is stable and has good performance.

3.5. Model Comparison

AlexNet [31] is the first to introduce the Relu activation function to speed up model convergence and to use the Dropout mechanism to avoid the problem of model overfitting. AlexNet consists of a total of eight convolutional neural network layers, including five convolutional layers, two fully connected layers, and an output layer, which allows for deep feature extraction. Each convolutional layer in the VGG [32] uses a smaller 3 × 3 convolutional kernel, and this small convolutional kernel can increase the sensory field of the convolutional layer, which can improve the network’s extraction of features. Moreover, the shallow and deep convolutional structures of the VGG are basically the same, and they are easy to optimize. Inception [33] is a multi-scale convolutional neural network. Each module in Inception uses a parallel structure, i.e., it operates in parallel using convolutional kernels of different scales and then splices them. The network is realized to extract the features in the image at different scales to obtain the optimal result. As shown in Table 7, the experiment compares the recognition accuracy and recognition consumption time of AlexNet, VGG16, InceptionV3, ResNet34, and ResNet34 + CBAM for the kiwifruit samples test set, and it can be found that ResNet34 + CBAM has the highest recognition accuracy, which is higher than AlexNet, VGG16, InceptionV3, and ResNet34 by 7.9%, 12.7%, 11.8%, and 4.3%, respectively. The single-sample recognition time of ResNet34 + CBAM is 15.0 ms, 6.0 ms, and 3.0 ms higher than AlexNet, InceptionV3, and ResNet34, respectively, and it is 80.0 ms lower than VGG16. It can be concluded that AlexNet, VGG16, and InceptionV3 have lower recognition accuracies for kiwifruit in the test set than ResNet34 and ResNet34 + CBAM. By analyzing the differences between the models, it can be found that AlexNet has fewer layers than ResNet34 + CBAM, and it has limited ability to extract features, so the recognition accuracy is low, but the consumed time cost is also low. VGG16 has fewer layers than ResNet34 + CBAM, but the number of fully connected layers used is high, so it results in a large number of parameters and consumes high time cost. InceptionV3 uses multiple-sized convolutional kernels for feature information extraction; it is weakly adapted to the input image compared to ResNet34 + CBAM, and it has low recognition accuracy for kiwifruit. ResNet34 + CBAM has improved recognition accuracy for kiwifruit compared to ResNet34, but it has an increase in time cost.

In summary, this study fuses ResNet34 with the CBAM to achieve highly accurate recognition of external defects in kiwifruit, which meets the online sorting requirements of grading lines, although it results in an increase in time cost.

4. Conclusions

In this study, a non-destructive detection method for external defects of kiwifruit based on ResNet was proposed. The experiment firstly optimized the learning rate value, attenuated the learning rate through a cosine-simulated annealing attenuation strategy, and used the Adam algorithm to accelerate the model convergence. Secondly, the optimal fusion method of ResNet34 with CBAM was investigated. The accuracy of ResNet34 + CBAM and ResNet34 in relation to the validation set was compared, and it was found that ResNet34 + CBAM has better stability than ResNet34. Finally, validation in relation to the test set reveals that the overall recognition accuracy of the ResNet34 + CBAM is 99.6%, which is higher than 99.0% for all evaluation metrics. Each index also verifies that ResNet34 + CBAM is capable and stable in recognizing kiwifruit. Compared with AlexNet, VGG16, InceptionV3, and ResNet34, it is found that within the online sorting time allowance, ResNet34 + CBAM improves the recognition accuracy over AlexNet, VGG16, InceptionV3, and ResNet34 by 7.9%, 12.7%, 11.8%, and 4.3%. Therefore, the method involved in this study provides technical support for online real-time detection and sorting of kiwifruit.

In this study, we achieved highly accurate detection of external defects in kiwifruit, but there exists the following work for further investigation.

The problem of model memory occupation is not taken into account. In the actual grading scenario, the hardware devices are not able to meet the running computation amount. Therefore, the subsequent work needs to ensure that the accuracy does not fall under the premise of reducing the number of operational parameters and computational to achieve model’s light weight. The main external presence defects of two varieties of kiwifruit were selected for this experiment. We need to further expand the dataset of defects and varieties with a small number of samples or periodic defects.

Author Contributions

Conceptualization, F.W.; methodology, F.W.; software, Y.P.; validation, C.L. and B.Z.; formal analysis, L.Z.; investigation, F.W.; resources, C.L. and B.Z.; data curation, F.W. and Y.P.; writing—original draft preparation, F.W.; writing—review and editing, C.L.; visualization, L.Z.; supervision, C.L. and B.Z.; project administration, F.W. and Y.P.; funding acquisition, C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Beijing Nova Program (No. 20220484066).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank Xunpeng Jiang for his valuable advice in method conceptualization, result validation, and manuscript writing.

Conflicts of Interest

The authors declare no conflict of interest.

References

Hu, J.; Huang, J.F.; Liu, X.D.; Bi, J.; Wang, J.H.; Xiao, A.H.; Shu, Z.X.; Dai, H. Progress in Research into Kiwifruit Quality Assessment Based on Near -infrared Spectroscopy. Food Res. Develop. 2022, 43, 196–201. [Google Scholar]
Chen, C. Study on Acoustic Vibration Nondestructive Testing Method for Internal Quality of Kiwifruit. Master’s Thesis, Southwest University of Science and Technology, Mianyang, China, 2020. [Google Scholar]
Shao, H.H.; Zheng, W.T.; Peng, J.Y. Chinese Goosebeery Stage Division Based on Computer Vision. Beijing Bio. Eng. 2009, 28, 531–533+544. [Google Scholar]
Zhou, Y.T.; Bing, F.; Wang, W.J.; Tian, L.S. Research on Detection of Kiwifruit Defect Based Image Processing. Comput. Knowled. Technol. 2012, 8, 3979–3981+3986. [Google Scholar]
Li, P.P. Automatic Grading Method of Kiwifruit Based on Mechine Vision Technology. Master’s Thesis, Northwest A&F University, Yangling, China, 2013. [Google Scholar]
Li, Q.Q. Research on Nondestructive Testing and Automatic Grading of Kiwifruit Based on Computer Vision. Master’s Thesis, Anhui Agricultural University, Hefei, China, 2020. [Google Scholar]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Tian, Y.W.; Wu, W.; Lu, S.Q.; Deng, H.B. Application of Deep Learning in Fruit Quality Detection and Classification. J. Food Sci. 2021, 42, 260–270. [Google Scholar]
Fan, S.X.; Li, J.B.; Zhang, Y.H.; Tian, X.; Wang, Q.Y.; He, X.; Zhang, C.; Huang, W.Q. On Line Detection of Defective Apples Using Computer Vision System Combined with Deep Learning Methods. J. Food Eng. 2020, 286, 110102. [Google Scholar] [CrossRef]
An, X.S.; Song, Z.P.; Liang, Q.Y.; Du, X.; Li, S.J. A CNN-Transformer-based Method for Sorting Citrus with Visual Defects. J. Huazhong Agric. Univ. 2022, 41, 158–169. [Google Scholar]
Cao, Z.D. Research on Intelligent Cognition Model and Operation Mechanism of Greengage Grade Based on Deep Learning. Master’s Thesis, Hefei University of Technology, Hefei, China, 2018. [Google Scholar]
Costa, A.Z.D.; Figueroa, H.E.H.; Fracarolli, J.A. Computer Vision Based Detection of External Defects on Tomatoes Using Deep Learning. Biosyst. Eng. 2020, 190, 131–144. [Google Scholar] [CrossRef]
Yu, X.J.; Lu, H.D.; Wu, D. Development of Deep Learning Method for Predicting Firmness and Soluble Solid Content of Postharvest Korla Fragrant Pear Using Vis/NIR Hyperspectral Reflectance Imaging. Postharvest. Biol. Technol. 2018, 141, 39–49. [Google Scholar] [CrossRef]
Garillos, M.C.A.; Chiang, J.Y. Multimodal Deep Learning and Visible-Light and Hyperspectral Imaging for Fruit Maturity Estimation. Sensors 2021, 21, 1288. [Google Scholar] [CrossRef]
Feng, L.; Zhou, S.S.; Zhou, L.; Zhao, Y.Y.; Bao, Y.D.; Zhang, C.; He, Y. Detection of Subtle Bruises on Winter Jujube Using Hyperspectral Imaging with Pixel-Wise Deep Learning Method. IEEE Access 2019, 7, 64494–64505. [Google Scholar] [CrossRef]
Wu, S.; Li, G.J.; Jie, D.F. Prediction Model Research of SSC in Watermelon Based on Deep Learning and Visible/near Infrared Spectroscopy. Food Mach. 2020, 36, 132–135. [Google Scholar]
Chen, Y.; Liao, T.; Lin, C.; Wan, H.; Li, H. Grape Inspection and Grading System Based on Computer Vision. TCSA. Mach. 2010, 41, 169–172. [Google Scholar]
Yao, J.; Qi, J.M.; Zhang, J.; Shao, H.M.; Yang, J.; Li, X. A Real-Time Detection Algorithm for Kiwifruit Defects Based on YOLOv5. Electronics 2021, 10, 1711. [Google Scholar] [CrossRef]
Zheng, S.; Meng, Q.; Wang, T.; Chen, W.; Yu, N.H.; Ma, Z.M.; Liu, T.Y. Asynchronous Stochastic Gradient Descent with delay compensation. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 4120–4129. [Google Scholar]
He, K.M.; Zhang, X.Y.; Ren, S.P.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Lin, K.S.; Zhao, Y.C.; Wang, L.N.; Shi, W.J.; Cui, F.F.; Zhao, T. MSWNet: A Visual Deep Machine Learning Method Adopting Transfer Learning Based upon ResNet50 for Municipal Solid Waste Sorting. Front. Environ. Sci. Eng. 2023, 17, 77. [Google Scholar] [CrossRef] [PubMed]
Wu, Y.; Lu, G.Y.; Zhu, Z.Q.; Bai, D.X.; Zhu, X.D.; Tao, C.Y.; Li, Y.N. A Landslide Warning Method Based on K-Means-ResNet Fast Classification Model. Appl. Sci. 2022, 13, 459. [Google Scholar] [CrossRef]
Wang, P.; Luo, F.; Wang, L.H.; Li, C.S.; Niu, Q.; Li, H. S-ResNet: An Improved ResNet Neural Model Capable of the Identification of Small Insects. Front. Plant Sci. 2022, 13, 1066115. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Meng, X.X.; Wang, X.W.; Yin, S.L.; Li, H. Few-shot Image Classification Algorithm Based on Attention Mechanism and Weight Fusion. J. Eng. Appl. Sci. 2023, 70, 14. [Google Scholar] [CrossRef]
Wang, F.; Yang, Y.J.; Wu, Z.; Zhou, J.C.; Zhang, W.S. Real-Time Semantic Segmentation of Point Clouds Based on an Attention Mechanism and a Sparse Tensor. Appl. Sci. 2023, 13, 3256. [Google Scholar] [CrossRef]
Ren, K.; Tao, Q.Y.; Han, H.G. A Lightweight Object Detection Network in Low-light Conditions Based on Depthwise Separable Pyramid Network and Attention Mechanism on Embedded Platforms. J. Franklin. Inst. 2023, 360, 4427–4455. [Google Scholar] [CrossRef]
Li, Y.S.; Qi, Y.N.; Mao, W.H.; Zhao, B.; Lv, C.X.; Ren, C.; Wang, J.Z. Automatic Weighing and Grading System for Korla Fragant Pear. Agric. Eng. 2018, 8, 63–68. [Google Scholar]
Zhang, P.; Li, D.L. CBAM + ASFF-YOLOXs: An improved YOLOXs for guiding agronomic operation based on the identification of key growth stages of lettuce. Comput. Electron. Agric. 2022, 203, 107491. [Google Scholar] [CrossRef]
He, T.; Zhang, Z.; Zhang, H.; Zhang, Z.Y.; Xie, J.Y.; Li, M. Bag of Tricks for Image Classification with Convolutional Neural Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 558–567. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM. 2017, 60, 84–90. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.556. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]

Figure 1. The whole workflow of kiwifruit defect feature recognition.

Figure 2. Kiwifruit samples: (a) Healthy; (b) Defective.

Figure 3. Image acquisition device.

Figure 4. Residual structural unit.

Figure 5. Residual structure: (a) Residual_1; (b) Residual_2.

Figure 6. Schematic diagram of CBAM module.

Figure 7. Model accuracy in kiwifruit validation set.

Figure 8. Loss values of the model in the kiwifruit validation set.

Figure 9. Original and Grad_Cam diagrams: (a) Healthy kiwifruit; (b) Defective kiwifruit.

Table 1. Classification of datasets.

Category	Train	Validation	Test
Healthy	1385	173	173
Defective	2344	294	294

Table 2. ResNet34 structure.

Convolution Group	Output Size	Procedure
Conv1	112 × 112	Convolution kernel: 7 × 7, Channel: 64, Step: 2
Conv2_x	56 × 56	Maximum Downsample: 3 × 3, Step: 2
Conv2_x	56 × 56	$[\begin{array}{l} 3 \times 3, 64 \\ 3 \times 3, 64 \end{array}] \times 3$
Conv3_x	28 × 28	$[\begin{array}{l} 3 \times 3, 128 \\ 3 \times 3, 128 \end{array}] \times 4$
Conv4_x	14 × 14	$[\begin{array}{l} 3 \times 3, 256 \\ 3 \times 3, 256 \end{array}] \times 6$
Conv5_x	7 × 7	$[\begin{array}{l} 3 \times 3, 512 \\ 3 \times 3, 512 \end{array}] \times 3$
	1 × 1	Average Downsampling, fc, Softmax

Table 3. Integration solutions.

	Integration Zone
1	Post Conv1
2	Post Conv3_x
3	Post Conv1 and Conv3_x
4	Post Conv3_x and Conv5_x
5	Post Conv1 and Conv5_x

Table 4. Training loss values of each integration solution model.

Integration Solution	Loss
1	0.18
2	0.20
3	0.15
4	0.13
5	0.11

Table 5. Kiwifruit test set results.

Model		Healthy	Defective	Acc/%
ResNet34	True	167	279	95.5
ResNet34	False	6	15	95.5
ResNet34 + CBAM	True	173	292	99.6
ResNet34 + CBAM	False	0	2	99.6

Table 6. Model evaluation.

P/%	R/%	S/%	F1score/%
100.0	99.3	100.0	99.6

Table 7. Comparison of the results of the five models.

Model	True	False	Time/ms	Acc/%
AlexNet	431	36	31.0	92.3
VGG16	423	44	126.0	88.4
InceptionV3	416	51	40.0	89.1
ResNet34	446	21	43.0	95.5
ResNet34 + CBAM	465	2	46.0	99.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, F.; Lv, C.; Pan, Y.; Zhou, L.; Zhao, B. Efficient Non-Destructive Detection for External Defects of Kiwifruit. Appl. Sci. 2023, 13, 11971. https://doi.org/10.3390/app132111971

AMA Style

Wang F, Lv C, Pan Y, Zhou L, Zhao B. Efficient Non-Destructive Detection for External Defects of Kiwifruit. Applied Sciences. 2023; 13(21):11971. https://doi.org/10.3390/app132111971

Chicago/Turabian Style

Wang, Feiyun, Chengxu Lv, Yuxuan Pan, Liming Zhou, and Bo Zhao. 2023. "Efficient Non-Destructive Detection for External Defects of Kiwifruit" Applied Sciences 13, no. 21: 11971. https://doi.org/10.3390/app132111971

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Efficient Non-Destructive Detection for External Defects of Kiwifruit

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Construction

2.1.1. Sample Source

2.1.2. Image Acquisition

2.1.3. Data Processing

2.1.4. Data Enhancement

2.2. Model Construction

2.2.1. Experimental Conditions

2.2.2. Model Analytics

2.2.3. Model Structure

2.3. Feature Attention Enhancement

2.4. Model Integration

2.5. Evaluation Indicators

2.6. Model Training

3. Experiments and Results

3.1. Model Fusion Program Training Results

3.2. Model Validation

3.3. Results Testing

3.4. Model Evaluation

3.5. Model Comparison

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI