MFCC Selection by LASSO for Honey Bee Classification

Libal, Urszula; Biernacki, Pawel

doi:10.3390/app14020913

Open AccessArticle

MFCC Selection by LASSO for Honey Bee Classification

by

Urszula Libal

^*,†

and

Pawel Biernacki

^*,†

Department of Acoustics, Multimedia and Signal Processing, Wroclaw University of Science and Technology, 50-370 Wroclaw, Poland

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2024, 14(2), 913; https://doi.org/10.3390/app14020913

Submission received: 13 December 2023 / Revised: 17 January 2024 / Accepted: 19 January 2024 / Published: 21 January 2024

(This article belongs to the Special Issue Apiculture: Challenges and Opportunities)

Download

Browse Figures

Versions Notes

Abstract

:

Featured Application

An automatic honey bee classification system based on audio signals for tracking the frequency of workers and drones entering and leaving a hive.

Abstract

The recent advances in smart beekeeping focus on remote solutions for bee colony monitoring and applying machine learning techniques for automatic decision making. One of the main applications is a swarming alarm, allowing beekeepers to prevent the bee colony from leaving their hive. Swarming is a naturally occurring phenomenon, mainly during late spring and early summer, but it is extremely hard to predict its exact time since it is highly dependent on many factors, including weather. Prevention from swarming is the most effective way to keep bee colonies; however, it requires constant monitoring by the beekeeper. Drone bees do not survive the winter and they occur in colonies seasonally with a peak in late spring, which is associated with the creation of drone congregation areas, where mating with young queens takes place. The paper presents a method of early swarming mood detection based on the observation of drone bee activity near the entrance to a hive. Audio recordings are represented by Mel Frequency Cepstral Coefficients and their first and second derivatives. The study investigates which MFCC coefficients, selected by the Least Absolute Shrinkage and Selection Operator, are significant for the worker bee and drone bee classification task. The classification results, obtained by an autoencoder neural network, allow to improve the detection performance, achieving accuracy slightly above 95% for the chosen set of signal features, selected by the proposed method, compared to the standard set of MFCC coefficients with only up to 90% accuracy.

Keywords:

innovation in beekeeping; honey bee; smart beehives; intelligent system; anomaly detection; feature selection; sound processing; spectral analysis; machine learning; neural network

1. Introduction

Monitoring honey bee colonies [1] is a crucial task because of their important role in providing pollination services. Today, the automated and mobile monitoring solutions introduced in beekeeping [2,3,4] mark the great need to develop and improve smart beekeeping systems.

The difference between a drone and a worker bee lies in their roles and physical characteristics [5,6,7]. Drones are male bees whose purpose, among many others such as the thermoregulation of the nest, is to mate with a queen at drone congregation areas (DCAs) [8,9,10], where they travel in the late spring and early summer. They are larger in size, have larger eyes and powerful flight muscles. On the other hand, worker bees are infertile females whose primary responsibilities include pollen and nectar foraging, building and maintaining the combs in the hive, and attending to the needs of the queen and larvae. They are smaller in size, have pollen baskets, and possess a stinger for defense. The queen bee is the only fertile female in the colony, responsible for laying eggs and ensuring the survival of the colony. She is larger in size and has a longer lifespan compared to worker bees. The queen bee is capable of laying up to 2000 eggs per day and is essential for establishing and maintaining the colony.

The natural reproductive process in honey bee colonies is associated with two phenomena: swarming, which is the division of a colony into two swarms while the mother queen leaves the hive with a part of the colony [11], called a primary swarm, and the eventual later afterswarms, and visiting drone congregation areas by the newly emerged queen after becoming sexually mature.

1.1. Swarming

The four-year study [12] on the timing of swarming in the annual cycle of a colony’s life revealed year-to-year variation from mid-May to early July, clearly connected to the variation in forage availability and depending on the timing of the brood rearing after surviving winter. Both factors contribute to the rapid growth of a colony [11], which allows for reproduction by swarming.

As Demuth describes in [13], a prosperous colony increases its brood in the spring, firstly rearing only the worker brood, “but as the colony increases in strength the rearing of drone brood is begun, thus providing for male bees in anticipation of swarming”. When the nest becomes crowded, the combs are filled with the brood, and there is a sufficient amount of nectar, several queen cells are prepared [14]. The swarming usually occurs about 9 days after eggs are laid in the queen cells [15], which are sealed with the queen larvae inside.

For several days before departure from the hive, the worker bees prepare the old queen for the flight by reducing her feeding [14]. A too heavy queen would be unable to sustain the flight. The swarming is a complex phenomenon and depends on many variables [15], including weather—it is more likely to occur on a hot sunny day. Before the swarm departs, the swarm worker bees become inactive even for several days. An abnormal number of stationary quiet bees on the bottom comb [14] is a clear sign of preparation for swarm departure. A large number of field workers remain in the hive instead of working in the fields during the last few days prior to the issuing of the swarm [15].

For the part of the colony that stays in the hive after swarming, it is necessary—considering reproduction success—for the young queen to emerge from her cell and take mating flights so that she can start laying eggs. In simple words, the nuptial flights of the young queen to distant drone congregation areas are intended for her fertilization by drones that are not her brothers to ensure adequate gene mixing [7,15].

1.2. Drones’ Flights

Drones do not survive the winter since they die after mating with a queen, and the remaining drones that do not mate are expelled from their colony in the fall to save stored food. The drones die off because of the lack of food and cold exposure after being forcefully removed from their hive. Young drones are raised the following spring. The drone brood is easy to spot because it is closed in an arched way on the top and is larger than the worker brood.

The drones occur in their colony seasonally with a peak in late spring. Their main task is to fly to drone congregation areas [8,9,10], where they mate with newly emerged queens. To be able to fly to such distant locations, drones prepare by performing orientation flights and strengthening the flying muscles. The observation of marked individual drones [16] indicates that they are actively flying out the hive only in two periods corresponding to the following:

Orientation flights—short flights of about 15 min performed by 6–9-day-old drones [16];
Mating flights—longer flights of about 30 min of older drones when they visit drone congregation areas. For example, in the study [16], the observed drones performed mating flights in the spring season from the age of 21 days to the end of their life. In the summer season, the same study registered mating flights of drones from the age of 13 days, but their life span was considerably shorter, with a maximum of 21 days.

In the first period, the immature drones (6–9 days old) perform orientation flights, which are followed by several days of resting inside the hive.

Drones achieve sexual maturity after 10–12 days from their emerging. The second period of the highest activity of drones is when they form congregation areas [10], where they gather in large numbers: several hundreds to 30,000 drones from potentially 240 different colonies in one place. Such a DCA usually takes place 5–40 m above ground [17] and can be 30–200 m long across. Most daily flights of sexually mature drones to drone congregation areas occur between 14:00 and 18:00, with a marked peak of activity at 16:00 [18]. The daily peak in drone activity at around 16:00 remains constant and seems to be less sensitive to changes in weather conditions compared to the worker bees’ peak. Drones commonly move between congregation areas during a single flight [9].

The study [16], based on an optical bee counter at the entrance of a hive counting the number of marked individual drones that returned to the hive, shows “a period of very low activity” between the orientation and mating flights. On most of the days, the number of worker bees exiting their hive exceeded the number of exiting drones, but on some specific days, the afternoon peak activity of drones (16:00) significantly preceded the workers peak.

Various studies [15,19,20,21] show that the drone population peaks around the time of the swarming peak. It is a consequence of the peak of drones rearing a little over a month before swarming. It is not a coincidence considering that the primary goal of the drones is to mate with new queens. It takes approximately 35 days from the laying of the unfertilized egg in the drone cell to develop a fully mature drone [22].

On average, the young queens mate with 12 drones [7] and store the spermatozoa in their spermatheca for future egg fertilization. The nuptial flights can occur over a period of several days, during which young queens take 1–6 flights per day [16] to visit DCAs.

1.3. Sounds of Honey Bees and Swarming Prediction

The “buzzing” sound generated during a bee flight is primarily produced by swift wing movement. A bee can produce other sounds by the coordinated vibration of various body parts, but this paper is focused on non-interfering monitoring outside of a hive, and thus we limit our approach to the buzzing sound produced by a flying bee. The rate of wing beats can approximately count even up to 260 beats per second [23]. The frequency of the buzzing sound can vary depending on the size and species of the bee.

A steady hum around 180 Hz is a basic sound produced by stationary bees during breathing movements, while the flight sound has a frequency around 250 Hz [24]. Woods claimed in [24] that for the audio signals registered inside a hive and cut to the frequency band of only 225–285 Hz, there was a distinguishable warble, present even 25 days before the issue of the swarm. Based on his discovery, he invented and patented the ’Apidictor’ device to predict the swarming much earlier than any visible signs could be noticed by a beekeeper.

The placement of a microphone inside a hive is a challenging task because bees recognize it as a foreign object and try to wrap around it (propolization), blocking the possibility of the correct measurement of acoustic pressure. To solve the problem, special cages for microphones can be used, or a metalic mesh as in [25,26,27]. Another solution is to measure the vibration generated by bees, as in the study [28], instead of sound.

Analysis of the power spectral density (PSD) of audio signals recorded inside a hive shows that changes in spectral components for 110 Hz [25] indicate swarming. In the same study, it was concluded that with increased activity during swarming moments, there is an increase in the frequency of generated sound that is visible as the power spectral density as a shift of the frequency band from 100–300 Hz to a higher frequency band of 500–600 Hz.

Besides the direct analysis of the frequency spectrum of audio recordings, automated methods using machine learning algorithms are nowadays extensively developed and tested. Research [29] applying the Convolutional Neural Network (CNN) to audio data recorded from hives shows that the swarming event can be detected at least 4 weeks before it occurs. In the study [27], audio signals were recorded for Carniola honeybee (Apis mellifera Carnica), and the classical Mel Frequency Cepstral Coefficients (MFCCs) were extracted as features for the multiclass classification task that predicts one of the three states of a hive, including the queenless state. For classification, several standard algorithms were used: support vector machines (SVMs), K-nearest neighbors (KNNs), logistic regression, random forest, and Multilayer Perceptron.

The observation of increased drone activity [30] close to the entrance to a hive correlates with the time of year when they travel to the drone congregation areas. The fact can indirectly allow for building an early warning system for the beekeepers to prevent swarming, where half of the bee colony can escape with an old queen. Since the drone population has its peak around the same time when there is the highest swarming mood as claimed in [15,19,20,21], it should be possible to build an early warning system based on the detection of the high flight activity of drones for audio signals recorded outside of a hive in close proximity to its entrance. There are methods to prevent swarming, such as taking away from the hive some of the produced honey frames to force the colony to focus on the foraging task. The beekeeper intervention can be suggested by a smart automatic system, based on bee activity analysis, for example, by the audio signal detection proposed in this paper. The causes of swarming are still poorly understood [15]. Beekeepers have reported that it can occur even when a colony is under constant monitoring and all preventive measures are immediately applied.

The paper is divided into four main sections. In Section 2, we introduce the methods used for the proposed detection system, including feature extraction of Mel-frequency cepstral coefficients (MFCCs), their selection by LASSO, and neural network implementation. In Section 3, the results of the bee detection are presented and analyzed. The paper is concluded with a discussion regarding the best feature selection for worker bee and drone audio recognition.

2. Materials and Methods

In the paper, we show the findings for a data set of audio recordings obtained through a beehive monitoring system [31] of domesticated honey bee Apis mellifera L. The proposed system has the ability to detect specific events such as high drone bee activity around the hive. All audio files in uncompressed WAV format were registered with the frequency of 44,100 Hz. To process the signals, the recordings were segmented into 1 s samples. For our test, we used 10,000 worker bee flight audio recordings and approximately 1700 samples of flying drone audio recordings. Our open-access data set can be found in [31].

The data collection stage is the first step in the machine learning process—see Figure 1. The seasonal character of the investigated events, such as higher drones’ activity, causes the main obstacle of gathering a sufficient number of data when monitoring the hive. Although recording worker bees at the entrance to a hive is not a problem, the recording of drones is a challenge due to the inability to predict the time of their appearance. We tried to keep the proportions of the number of recordings for worker bees and for drones and not make the database of recordings too imbalanced, hence the size of our database.

After data collection, it is crucial to perform feature extraction. This involves extracting relevant information from the collected data, in our case, audio signals. The goal of feature extraction is to find proper representation of the analyzed data, allowing for further processing by machine learning techniques. The preliminary analysis of the power spectral density of audio recordings for the drone bee and worker bee classes, presented in Figure 2, shows the potential for class distinction based on the representation of signals in the frequency domain. We delve deeper into the topic in Section 2.1, where we present a common frequency domain approach to feature extraction from audio recordings.

The next stage is feature selection—the choice of the most important features that will be used to train the neural network. The selection of appropriate features aims to improve the model performance and reduce its complexity. This can be done using traditional statistical methods, such as analysis of variance and correlation coefficients, or by sequential (forward stepwise or backward stepwise) feature selection techniques as described in Section 2.2.

The final stage is the training of a classifier. In this step, the collected and processed data are used to train the model. There are many different machine learning algorithms, such as neural networks, random forests, support vector machines, etc. Neural networks are a popular model that is used to solve problems related to high-dimensional data processing, audio detection, image recognition, or text analysis. The model is adjusted based on the training data to minimize error and increase its predictive abilities. We come back to that topic in Section 2.4.

The entire machine learning process involves iterating through these stages to refine the model and improve the results. An important aspect of this process is also model validation, which involves checking its effectiveness on previously unseen data to assess whether the model is well fitted and generally predicts results accurately.

2.1. Audio Signal Parametrization: MFCC and Derivatives

MFCCs (Mel-frequency cepstral coefficients) are a feature extraction technique commonly used in speech and audio signal processing [32,33]. The coefficients represent the short-term power spectrum of a signal based on the Mel-frequency scale, which mimics the nonlinearity of human auditory perception. The process involves several steps presented in Figure 3, including segmenting the signal into frames, applying a window function, calculating the Discrete Fourier Transform (DFT) of each frame, mapping the resulting spectrum onto the Mel scale, taking the logarithm of the Mel-spectrum (cepstrum), and finally applying the Discrete Cosine Transform (DCT) to obtain the MFCCs. These coefficients are often used as input features for speech recognition systems or other applications in audio processing [34,35].

To simulate processing by the human auditory system, so-called filter banks are used, i.e., collections of filters with parameters similar to the critical bands of the auditory membrane. The most commonly used filter bank is the Mel filter used to model the nonlinearity of pitch perception. Mel filters are triangular in shape and account for differences in the frequency resolution of the human auditory system, which is higher for low frequencies and lower for high frequencies. They are based on the Mel scale, whose dependence on the frequency scale expressed in hertz is given by the formula

m e l = 2595 * l o g_{10} (1 + \frac{f}{700}) .

(1)

On the output of each filter

H_{m}

, the energy of the band is calculated by the following formula:

S_{m} = \sum_{f = 0}^{K - 1} {| X (f) |}^{2} * H_{m} (f),

(2)

where K is the length of FFT, equal to the number of signal samples in the analyzed frame, m is a filter index, and X is a frequency spectrum of an input signal frame.

In further calculations, the logarithm of the energy

S_{m}

is used, which allows to reduce the sensitivity of the filters to very loud and very quiet sounds, and to model the nonlinear amplitude sensitivity of the listening system. The final stage of the algorithm is the application of the Discrete Cosine Transform (DCT). The resulting values of the MFCC coefficients are calculated as

c_{i} = \sqrt{\frac{2}{M}} \sum_{m = 1}^{M} l o g (S_{m}) c o s (\frac{π i}{M} (m - 0.5)),

(3)

where i is the MFCC coefficient index, and M is equal to the number of used filters.

An important advantage of MFCC coefficients is their low sensitivity to noise. In combination with the MFCC coefficients, their first and second derivatives, also called deltas, are often used. They are denoted by the symbols

Δ

and

Δ^{2}

, or Delta and DeltaDelta, respectively. The derivatives of the coefficients provide a simple way to describe signal dynamics at relatively low computational cost.

2.2. Feature Selection via LASSO Regression

In this section, we introduce feature selection techniques: the LARS (Least Angle Regression) regularization algorithm and its modification LASSO (Least Absolute Shrinkage and Selection Operator). The regularization LARS [36] method differs from LASSO [37] regularization mainly in the way variables are selected for the model—with regularization in

ℓ 2

-norm for LARS and in

ℓ 1

-norm for the LASSO algorithm.

In the case of LARS, variables are added to the model sequentially based on their correlation with the residuals. At each step, LARS selects the variable with the highest correlation, and then increases its coefficient evenly until the next variable reaches the same correlation. At this point, two variables are added simultaneously, and their coefficients are increased at the same rate. This process is repeated until all variables have been added or a specified limit on the number of variables is reached. The LARS regularization minimizes the residual sum of squares

R S S

(in

ℓ^{2}

-norm) as follows:

{\hat{β}}^{L A R S} = arg min_{β} \{| | y - {X β | |}_{2}^{2}\},

(4)

where

X = {(X_{i j})}_{i = 1, \dots, N}^{j = 1, \dots, p}

is a matrix of signal features, with feature vectors

(X_{i 1}, X_{i 2}, \dots, X_{i p})

in the ith row for the ith signal from the training set counting N recordings, and y is a vector of class labels

y_{i} \in {- 1, 1}

. In our experiments, the labels are used for feature vectors representing worker bee and drone bee recordings, and the features

X_{i j}

are calculated as MFCC coefficients. The

β

is a p-dimensional vector of regression coefficients

β_{j}

, that is,

β = (β_{1}, β_{2}, \dots, β_{p})

. The regression coefficients

β

are obtained due to minimization of the residual sum of squares

R S S = y - X β

in

ℓ^{2}

-norm.

In the case of LASSO regularization, however, variables are also added to the model sequentially but in a more rigorous manner. At each step, LASSO selects the variable that has the largest influence on the residuals, then increases its coefficient while decreasing the coefficients of the remaining variables. Variables that do not contribute to the model have

β_{j}

coefficients equal to 0. This process is repeated until all variables have been added to the model, or until certain constraints on the number of variables, or the sum of the absolute values of the coefficients, have been reached. The LASSO algorithm can be defined by the following expression:

{\hat{β}}^{L A S S O} = arg min_{β} \{{∥y - X β∥}_{2}^{2}\} subject to {∥ β ∥}_{1} \leq t,

(5)

where

{∥ u ∥}_{p} = {(\sum_{i = 1}^{N} {| u_{i} |}^{p})}^{1 / p}

is the standard

ℓ^{p}

-norm. The above expression can be written more compactly in the Lagrangian form

{\hat{β}}^{L A S S O} = arg min_{β} \{\frac{1}{N} {∥y - X β∥}_{2}^{2} + α {∥ β ∥}_{1}\},

(6)

where the exact relationship between the regularization parameters t and

α

is data-dependent. Finally, the formula takes the following form:

{\hat{β}}^{L A S S O} = arg min_{β} \{| | y - {X β | |}_{2}^{2} + α \sum_{j = 1}^{p} | β_{j} |\} .

(7)

The coefficients

{\hat{β}}^{L A S S O}

, obtained by the LASSO algorithm, for the sequentially decreasing values of the regularization parameter

α

are presented in Figure 4. Each colorful line in Figure 4 represents the values of an individual coefficient

β_{j}

from the

{\hat{β}}^{L A S S O}

vector for different

α

values. Stopping the regularization procedure for a chosen

α

value results in obtaining a set of non-zero coefficients

β_{j}

. For

α = 0

, we would obtain the LARS method, and all coefficients

β_{j}

would be non-zero—compare expressions in (4) and (7).

Regularization by LASSO can be understood as a type of forward stepwise feature selection technique. The lower the parameter

α

in the minimized expression (6), the more non-zero coordinates in the feature vector

{\hat{β}}^{L A S S O}

. For example, for the honey bee data shown in Figure 4, with

p = 120

initial total number of coefficients in MFCC + Delta + DeltaDelta set, we observe the selection of 11 coefficients for

α = 0.1

, and for

α = 0.01

already as many as 60 non-zero coefficients.

The LASSO regularization is applied for several reasons. LASSO adds a penalty term t to the loss function that limits the magnitude

{∥ β ∥}_{1}

of the coefficients. The method selects the most important features by shrinking the coefficients

{\hat{β}}^{L A S S O}

of irrelevant or less important features to zero, thanks to the

ℓ^{1}

-norm component in Equation (6). This promotes sparsity in the model and allows one to focus on the most relevant signal features from the feature vector

(X_{i 1}, X_{i 2}, \dots, X_{i p})

. That reduces the complexity of the model and prevents overfitting. It provides a clear indication of which variables have a significant impact on the outcome in the performed tasks.

That is a reason for the popularity of the LASSO regression as a feature selection method in machine learning, including classification tasks [38], also in modified forms, such as elastic net, adaptive LASSO, group LASSO [39], Diagnostic-LASSO [40], or LASSO logistic regression [26], just to mention a few, for determining whether a hive is queenless.

2.3. Information Criterion

An information criterion is a statistical measure used in model selection and comparison. It quantifies the trade-off between the complexity of a model and its goodness-of-fit to the data. The key objective of calculating an information criterion is to choose the most suitable model among a set of competing models. There are various information criteria available, but some of the commonly used include the Akaike Information Criterion (AIC) [41] and the Bayesian Information Criterion (BIC) [42]. These criteria penalize complex models that have more parameters, preferring simpler models that achieve comparable goodness-of-fit. In this section, we discuss the criteria for feature vector size optimization. In short, models with lower information criterion value have better predictive power.

The Akaike Information Criterion was developed by Hirotogu Akaike [41], and its objective is to balance the predictive power of a model by minimizing both the number of model parameters p and the prediction error by maximizing the estimated maximum likelihood L:

A I C = 2 p - 2 ln (L) .

(8)

The Bayesian Information Criterion, also known as the Schwarz criterion [42], is based on a Bayesian approach and is calculated as follows:

B I C = p ln (n) - 2 ln (L),

(9)

where p is the number of model parameters, L is the maximum likelihood estimated by the model, and n is the number of observations.

By calculating an information criterion for different models (i.e., different sets of features chosen from the MFCC + Delta + DeltaDelta set via LASSO), one can compare their relative performance and select the model that strikes the optimal balance between model complexity and fit to the data. This process helps prevent overfitting, where a model performs very well on the training data but struggles with new data, and enables the identification of the most suitable model for predicting or explaining the underlying phenomenon. In other words, using the information criterion combined with LASSO feature selection allows us to find the best set of features chosen from MFCC + Delta + DeltaDelta, which are used for neural network training in the next step.

2.4. Classification by Autoencoder Neural Networks

An autoencoder neural network [43,44,45] in a classification task operates in two stages: the encoding phase and the decoding phase (see Figure 5).

In the encoding phase, the neural network is tasked with transforming the input information, such as an image or text, into an efficient representation in the form of a lower-dimensional encoded vector. The network consists of an input layer, a hidden layer, and an output layer. The input layer receives the input data and then transforms them using internal weights and activation functions. The hidden layers, which are usually smaller in size than the input layer, encode the information by reducing its dimensionality. In the decoding phase, the neural network reconstructs the input data based on the code in one of the hidden layers. At this point, the quality of the reconstruction can be evaluated based on the calculated difference between the input and output data.

In a classification task, an autoencoder neural network can be used as a method to determine the vector representation of input data. After the encoding phase, a condensed vector (a code) is obtained that represents the data. Then, such a vector can be passed to another classifier, such as the output layer, which performs classification based on the received representation. As a result, an autoencoder neural network can serve as a feature reducer in a classification task, which can increase the efficiency of the classifier.

In audio signal recognition, autoencoder neural networks can be used to learn compact representations of audio signals that capture their essential features [44,45]. This can be useful for tasks such as pattern recognition, classification, identification, anomaly detection, or noise reduction. By training an autoencoder on a large data set of audio signals, the network can learn to extract features that are relevant to the task at hand, while also discarding noise and irrelevant information.

One common approach to using autoencoders in audio signal recognition is to train the network on a reconstruction task, where the input is an audio signal and the output is the reconstructed audio signal [43,44]. The loss function used during training is typically a measure of the difference between input and output signals, such as the mean squared error (MSE) or the mean absolute error. Once the network is trained, the compressed representation learned by the encoder can be used as a feature vector for identifying or classifying different types of audio signals.

The scheme of such a network is shown in Figure 5.

For the classification of honey bees, an autoencoder neural network had the following structure, shown in Figure 5, with 4 fully connected hidden layers in total: 2 hidden layers for the encoder and 2 hidden layers for the decoder. In the encoder, all hidden layers had ReLu activation functions. In the decoder, one hidden layer used ReLu (the layer called code), and the second one, the sigmoid activation function. The size of the input layer depends on the size of the feature vector, and the exact number of features are presented in Section 3 regarding the results. The code layer in our autoencoder structure was a vector of size 8.

The general approach to applying the autoencoder to the drone bee detection task was as follows: the autoencoder neural network was trained on the feature vectors obtained from the worker bee audio recordings only, minimizing the reconstruction mean square error for the coefficient vectors reconstructed by the decoder. A feature vector representing a drone bee audio recording, which has been input to the autoencoder (already trained on worker bees recordings) is then considered an anomaly, and produces a much higher reconstruction MSE than for an average worker bee recording. The process of anomaly detection using an autoencoder is completed with the construction of the final classifier by defining the two decision areas—for worker bees with low reconstruction MSE and for drone bees with high reconstruction error—separated with the threshold equal to

t h r e s h o l d = m e a n (M S E_{t r a i n}) + s t d (M S E_{t r a i n}),

(10)

where

M S E_{t r a i n}

is the reconstruction error for worker bee feature vectors from the training data set,

m e a n (M S E_{t r a i n})

is its mean value, and

s t d (M S E_{t r a i n})

its standard deviation. The value of threshold given by Equation (10) is the most common approach to specify the decision areas in the two-class recognition problem for the autoencoder neural network.

3. Results

In this section, we provide the results of the audio signal processing and detection for the honey bee data set [31] of 10,000 worker bees and 1700 drone bees recordings. The processing consists of feature extraction to the form of the MFCC coefficients vectors, then in the next step—feature selection by LASSO and the analysis of the achieved information criteria values. And in the final step, we have the classification of audio signals from worker and drone classes by the autoencoder neural network. The discussion of the obtained results is placed in Section 4.

3.1. Features Selected by LASSO with AIC and BIC

After feature extraction, the coefficients representing the audio recordings were chosen from the set MFCC + Delta + DeltaDelta of Mel-frequency cepstral coefficients and their first and second derivatives. The values of the Akaike Information Criterion and Bayesian Information Criterion obtained for different values of

α

regularization parameter are shown in Figure 6. The number of features selected by LASSO for different values of the Akaike Information Criterion and Bayesian Information Criterion are presented in Table 1, and the exact indices of the coefficients selected for AIC and BIC can be found in Figure 7.

To better explain how the proposed procedure works, let us analyze a particular case for regularization parameter

α = 0.03

. The chosen feature set in that specific case consists of 29 coefficients obtained from the Formula (6). The feature set MFCC + Delta + DeltaDelta from which the coefficients were chosen includes: 40 Mel-frequency cepstral coefficients (MFCC), 40 first derivatives (Delta) and 40 second derivatives (DeltaDelta). The 29 selected coefficients, shown in Figure 7, have the following indices: from MFCC—1, 2, 4, 5, 7, 8, 9, 12, 13, 14, 18, 21, 22, 27, 28, 31, 32, 33, 35, 36, 39; from Delta—42, 44, 45, 47, 49, 53, 61; and finally from DeltaDelta—only 84.

The information criteria associated with that

α = 0.03

value, calculated for the training set, are equal:

824.2

for the Akaike Information Criterion and

826.4

for the Bayesian Information Criterion. The values of the criteria were very close—compare also Figure 6—causing the numbers of selected-by-LASSO features to be equal for the chosen cases of this presentation. And in the exemplary case for

α = 0.03

, both information criteria AIC and BIC led to the same set of selected-by-LASSO coefficients representing the audio recordings.

The results chosen to be shown in Table 1 include the values of regularization parameter

α

from the expression (6) and the related values of AIC and BIC; in that way, the jump in the number of features selected by LASSO is approximately 10 coefficients. The LASSO procedure sometimes adds more than one coefficient to the chosen representation in one step—see the LASSO path in Figure 4—and this is a reason why we could not present the exact increase in the feature number as being equal to 10 as planned.

As we stated in Section 2.3, the information criteria are statistical measures for the comparison of different models, i.e., different coefficient sets, which are a trade-off between the complexity of a model and its goodness-of-fit to the data, with higher predictive power for the lower information criterion value.

The values of AIC and BIC in Figure 6 drop swiftly together with the decrease in the regularization parameter

α

value until the inflection point on the graph for the

α \approx 0.01

. For the lower alpha values, from

α = 0.065

to

0.0001

, there is almost no significant change in value, which indicates that the particular models (for 69, 80, 93, 101, 111 and 120 selected coefficients) carry the same measure of information and are the best candidates to be chosen for the training of the neural network.

3.2. Classification Quality

For application purposes, it would be enough to detect and count only drone bees. But for full analysis of the classification quality, we present the results for the honey bee classification problem with two defined classes: worker bees and drone bees. And the three common parameters used to evaluate the performance of a statistical classification model are accuracy, precision, and recall, which are shortly described below.

Accuracy measures the overall correctness of the predictions for the particular model, i.e., in our case, the set of selected by LASSO Mel-frequency cepstral coefficients. The accuracy is computed using the following expression:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N},

(11)

where

T P

is the count of true positive cases;

T N

—true negative;

F P

—false positive; and

F N

—false negative. It is the ratio of correctly classified instances (true positives and true negatives) to the total number of instances in the test data set.

Recall, also known as sensitivity, or the true positive rate, measures the proportion of correctly predicted positive instances out of the total actual positive instances. Recall focuses on the model’s ability to detect drone bees, which is a primary goal of the paper. The recall is defined as

R e c a l l = \frac{T P}{T P + F N} .

(12)

Precision measures the proportion of correctly predicted positive instances out of the total instances predicted as positive. It is the ratio of true positives to the sum of true positives and false positives. Precision focuses on the model’s ability to avoid false positives and is particularly useful when the cost of false positives is high. The precision is calculated by the formula

P r e c i s i o n = \frac{T P}{T P + F P} .

(13)

Another measure of classification performance, independent of the true negative count is the F1-score. Its value varies from 0 to 100%, and the common interpretation is that values over 90% indicate very good performance of a classifier. The F1-score is defined as follows:

F 1 - s c o r e = \frac{2 T P}{2 T P + F P + F N} .

(14)

The results of the honey bee audio recording classification are presented in Figure 8 and Figure 9. Both figures show the values of the evaluation parameters: accuracy, precision, and recall. The first graph, Figure 8, applies to the classification results for different number of features selected by LASSO: 29, 41, 49, 60, 69, 80, 93, 101, 111 from the set MFCC+Delta+ DeltaDelta, and additionally for the full set of 120 coefficients. On the other hand, the second graph, Figure 9, presents the evaluation parameters for the three cases with no use of LASSO selection:

Full set of 40 MFCC features, i.e., Mel-frequency cepstral coefficients;
Full set of 80 features in MFCC+Delta, consisting of 40 MFCC features + 40 first derivatives;
Full set of 120 features in MFCC+Delta+DeltaDelta, which is created by 40 MFCC features + 40 first derivatives + 40 second derivatives.

The exact classification results, for the feature sets selected by LASSO and for the full sets of MFCC features, are placed in Table 2 and Table 3. The highest accuracy, recall and precision rates were obtained for the models selected by LASSO, resulting with the choice of 69 and 80 features. In those two cases, the accuracy reached

95.3 %

, recall

85.5 %

, precision

98.4 %

, and F1-score

92.1 %

. The most important indicator from the application point of view was recall, which for all LASSO selection cases had a value around 84–85%. The recall value means the drone bee detectability rate, and this is a reason why we want that value to be the highest possible.

The best three results in the sense of drone detection, and the highest recall, were obtained for the 69, 80 and 29 selected-by-LASSO coefficients with recall values:

85.52 %

,

85.47 %

and

85.17 %

, respectively. On the other hand, for the original coefficients sets, presented in Table 3, the recall was only

81.4 %

for the MFCC set, and slightly exceeded

84 %

for the feature sets MFCC+Delta and MFCC+Delta+DeltaDelta containing derivatives. This also shows that in the drone detection task, the first derivatives of MFCC add important information to the audio recording representation in the form of the feature vectors.

The cases with LASSO selection of 69 and 80 features significantly outperformed the cases without any feature selection. In case of no feature selection, the highest accuracy was achieved for the basic set of MFCC features without their derivatives, but still in that case, the accuracy reached only

90.6 %

. The remaining evaluation measures were moderately high precision equal to

86.2 %

and recall—the worst from all cases—equal to only

81.4 %

. High precision together with low recall means that the classifier in that case was effective in detecting correctly worker bees but not capable of correctly detecting drone bees, which is the opposite of what we wanted to achieve. That proves that the LASSO selection of MFCC coefficients and their derivatives makes sense in this specific task and improves not only the general classification quality measured simply by the classifier accuracy but also the resulting performance of the drone detection, shown by the higher recall value.

The best indicator, showing that application of the LASSO method for MFCC selection improves honey bee classification, is a direct comparison of the following two cases:

Subset of 80 features selected by LASSO;
Full set of 80 features in MFCC+Delta.

In short, the first set is a subset of 80 coefficients chosen by the LASSO algorithm from the set of 120 features in MFCC+Delta+DeltaDelta, optimizing the choice according to the theory presented in Section 2.2. The second set consists of the first 80 coefficients from MFCC+Delta+DeltaDelta set. The comparison of statistical measures for the two cases gives the following conclusions: for MFCCs selected by LASSO, accuracy was greater by

12.93 %

, recall by

1.32 %

, precision by

32.46 %

and the F1-score by

11.25 %

. The F1-score indicates that the best cases are those for 69 and 80 features selected by LASSO.

3.3. Confusion Matrix

To evaluate the performance of the classification task for different models, we additionally used another common tool, the confusion matrix. It is a table that summarizes the predictions made by the model and compares them with the actual labels or classes of the data. The confusion matrices for the original coefficients sets MFCC, MFCC + Delta and MFCC + Delta + DeltaDelta and for the analyzed feature selection cases are presented in Figure A1 and Figure A2 in Appendix A.

It can be noted that for the cases with 60, 69 and 80 features selected by LASSO, confusing real worker bees by labeling them with the drone label occurred the most rarely, with the lowest false positives (see almost black lower left square in the confusion matrices). The overall best results were obtained for those three cases with false positive rates (also called the false alarm rate) equal to around

0.6 %

, which translates to only around 60, on average, for every 10,000 worker bees being labeled as drones.On the other hand, the false negative rate (so-called miss rate) equals to around

14.5 %

in the same cases, what means that that percentage of drones on average were confused for being worker bees. The fact leaves space for future improvements, especially for imbalanced data sets such as ours, where the majority of recordings contain worker bee sounds.

The worst case in terms of correct drone bee detection was obtained for the original set of MFCC, without derivatives and without LASSO feature selection. The confusion matrix, presented in Figure A2 in Appendix A, shows that only

81.39 %

of drones were correctly classified as drones, and

18.61 %

of them were wrongly labeled as worker bees. The original MFCC set produces the lowest recall value and the highest precision, which means that that coefficient set is overfitted to the worker bee recordings class, and does not perform well in the drone detection task. The experiments results show that the LASSO selection of MFCC features can improve the drone detection quality, but still the overall performance is highly dependent on the collected data.

4. Discussion

The paper analyzes the statistical feature selection method based on the least absolute and selection operator (LASSO) for the popular form of the parametrization of audio recordings in the frequency domain by Mel-frequency cepstral coefficients (MFCCs) and their first and second derivatives. LASSO regularization is a forward stepwise sequential technique, adding new non-zero coefficients to a feature vector while decreasing the penalty on that feature vector in the

ℓ^{1}

-norm. The resulting reduced feature vector is fed to the autoencoder neural network for honey bee classification, with two signal classes: worker bees and drones.

The chosen subset of 69 coefficients from the set MFCC+Delta+DeltaDelta performed with the highest quality in the honey bee recognition task based on audio recordings. In the particular case of 69 selected coefficients, the LASSO selector chose 31/40 MFCC coefficients, 26/40 first-derivative Delta coefficients and 12/40 second-derivative DeltaDelta coefficients from the original set of 120 cepstral features. The frequency bands of the filter banks related to those selected MFCCs cannot give any clear information about the most significant frequency bands for the bee audio recordings due to covering most of the frequency spectrum—compare with Figure 7 for

α = 0.0065

. However, it can be concluded that the first derivatives have more than twice as much influence on the results as the second derivatives.

The main goal was to apply those feature extraction and selection methods for the honey bee classification task with worker bee and drone classes, helping to detect higher activity of male drone bees close to the entrances of hives. The paper shows that by applying LASSO selection to MFCC features, the overall two-class classification problem attains higher accuracy, compared to the original set of MFCC coefficients. In addition, the detection quality of the drone alone is increased as well.

The accuracy for the original set MFCC of 40 coefficients is equal to

90.6 %

. For the sets with derivatives, the set MFCC+Delta of 80 coefficients and the set MFCC+Delta+DeltaDelta of 120 coefficients, the accuracy is decreased significantly to

82.3 %

and

81.2 %

, respectively.

The highest accuracy, around

95.3 %

, is obtained for the feature sets containing 69 and 80 of the coefficients selected by LASSO, which were chosen from the original set MFCC+Delta+DeltaDelta of 120 coefficients, giving

4.7 %

improvement compared to the results achieved for the best case without LASSO feature selection. For those reduced models, the drone detection rate—measured by recall—also reached the highest value, equal to around

85.5 %

, which is presented in Table 2. The chosen subsets of features are linked to the information criteria, Akaike and Bayesian, which achieved low values, indicating that the selected feature sets represent the optimal set of features representing the training data. The quality of drone detection was improved by using the LASSO selection technique.

The proposed approach can be applied in a smart beehive monitoring system for the purpose of the non-invasive detection of increased drone activity around beehives by fast audio processing, including the use of the trained autoencoder neural network, described in Section 2.4. The proposed solutions can also serve as a basis for constructing an early-stage alarm system to identify the swarming mood in a honey bee colony.

Author Contributions

Conceptualization, U.L.; methodology, U.L. and P.B.; software, U.L. and P.B.; validation, U.L. and P.B.; formal analysis, U.L.; investigation, U.L. and P.B.; resources, P.B.; data curation, P.B.; writing—original draft preparation, U.L. and P.B.; writing—review and editing, U.L. and P.B.; visualization, U.L. and P.B.; supervision, U.L.; project administration, U.L.; funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available in a publicly accessible repository. The data presented in this study are openly available in “Dataset for honey bee audio detection” at https://zenodo.org/doi/10.5281/zenodo.10359685, reference number [31].

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AIC	Akaike Information Criterion
BIC	Bayesian Information Criterion
DCA	Drone Congregation Area
DCT	Discrete Cosine Transform
DFT	Discrete Fourier Transform
LARS	Least Angle Regression
LASSO	Least Absolute Shrinkage and Selection Operator
MFCC	Mel-Frequency Cepstrum Coefficient
MSE	Mean Square Error

Appendix A

The numerical experiments were performed on the data set with a much more numerous worker bee class with 10,000 audio recordings compared to only 1700 drone bee audio recordings. The training and testing of the autoencoder neural network was repeated 100 times for all cases with a random choice of 80% of signals from the worker bee class for training and 20% for testing. In other words, the tests were performed 100 times on 2000 (20%) audio frames from the worker bee class. Due to the highly imbalanced data set, for each of those tests, we randomly chose a subset of 850 (50%) audio frames from the drone class.

The confusion matrices contain the integer counts of the predicted classes for the signals from both classes. The total numbers of the tested cases were equal to 200,000 cases for the worker bee class and 85,000 for the drone class. It should be marked here that we consider the drone bee class to be “positive” and the worker bee class to be “negative” since our primary goal is drone detection. The confusion matrices are presented in Figure A1 and Figure A2.

Figure A1. Confusion matrices for honey bee classification obtained for different feature subsets selected by LASSO from the MFCC + Delta + DeltaDelta set.

Figure A2. Confusion matrices for honey bee classification obtained for the original feature sets: MFCC, MFCC + Delta and MFCC + Delta + DeltaDelta.

References

Capela, N.; Sarmento, A.; Simões, S.; Lopes, S.; Castro, S.; Alves da Silva, A.; Alves, J.; Dupont, Y.L.; de Graaf, D.C.; Sousa, J.P. Exploring the External Environmental Drivers of Honey Bee Colony Development. Diversity 2023, 15, 1188. [Google Scholar] [CrossRef]
Huet, J.-C.; Bougueroua, L.; Kriouile, Y.; Wegrzyn-Wolska, K.; Ancourt, C. Digital Transformation of Beekeeping through the Use of a Decision Making Architecture. Appl. Sci. 2022, 12, 11179. [Google Scholar] [CrossRef]
Ntawuzumunsi, E.; Kumaran, S.; Sibomana, L. Self-Powered Smart Beehive Monitoring and Control System (SBMaCS). Sensors 2021, 21, 3522. [Google Scholar] [CrossRef] [PubMed]
Ntawuzumunsi, E.; Kumaran, S.; Sibomana, L.; Mtonga, K. Design and Development of Energy Efficient Algorithm for Smart Beekeeping Device to Device Communication Based on Data Aggregation Techniques. Algorithms 2023, 16, 367. [Google Scholar] [CrossRef]
Caron, D.M.; Connor, L.J. Honey Bee Biology and Beekeeping, 3rd ed.; Wicwas Press: Kalamazoo, MI, USA, 2022. [Google Scholar]
Tautz, J.; Heilmann, H.R.; Sandeman, D.C. The Buzz about Bees: Biology of a Superorganism; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Winston, M.L. The Biology of the Honey Bee; Harvard University Press: Cambridge, MA, USA, 1987; ISBN 0-674-07409-2. [Google Scholar]
Muerrle, T.M.; Hepburn, H.R.; Radloff, S.E. Experimental determination of drone congregation areas for Apis mellifera capensis Esch. J. Apic. Res. 2007, 46, 154–159. [Google Scholar] [CrossRef]
Woodgate, J.L.; Makinson, J.C.; Rossi, N.; Lim, K.S.; Reynolds, A.M.; Rawlings, C.J.; Chittka, L. Harmonic radar tracking reveals that honeybee drones navigate between multiple aerial leks. iScience 2021, 24, 102499. [Google Scholar] [CrossRef]
Zmarlicki, C.; Morse, R.A. Drone congregation areas. J. Apicult. Res. 1963, 2, 64–66. [Google Scholar] [CrossRef]
Rangel, J.; Seeley, T.D. Colony fissioning in honey bees: Size and significance of the swarm fraction. Insect. Soc. 2012, 59, 453–462. [Google Scholar] [CrossRef]
Seeley, T.D.; Visscher, P.K. Survival of honeybees in cold climates: The critical timing of colony growth and reproduction. Ecol. Entomol. 1985, 10, 81–88. [Google Scholar] [CrossRef]
Demuth, G.S. Swarm control. Fmrs But. United States Dep. Agric. 1921, 1198, 1–28. [Google Scholar]
Allen, M.D. The behaviour of honeybees preparing to swarm. Anim. Behav. 1956, 4, 14–22. [Google Scholar] [CrossRef]
Johnson, B.R. Honey Bee Biology; Seeley, T.D., Ed.; Princeton University Press: Princeton, NJ, USA; Oxford, UK, 2023. [Google Scholar]
Reyes, M.; Crauser, D.; Prado, A.; Le Conte, Y. Flight activity of honey bee (Apis mellifera) drones. Apidologie 2019, 50, 669–680. [Google Scholar] [CrossRef]
Ellis, J.; Lawrence, J.C.; Koeniger, N.; Koeniger, G. Mating Biology of Honey Bees (Apis mellifera); Wicwas Press: Kalamazoo, MI, USA, 2015; ISBN 978-1878075383. [Google Scholar]
Hellmich, R.L.; Rinderer, T.E.; Danka, R.G.; Collins, A.M.; Boykin, D.L. Flight times of Africanized and European honey bee drones (Hymenoptera: Apidae). J. Econ. Entomol. 1991, 84, 61–64. [Google Scholar] [CrossRef]
Allen, M.D. Drone production in honey-bee colonies (Apis mellifera L.). Nature 1963, 199, 789–790. [Google Scholar] [CrossRef]
Page, R.E. Protandrous reproduction in honey bees. Environ. Entomol. 1981, 10, 359–362. [Google Scholar] [CrossRef]
Seeley, T.D. The Lives of Bees: The Untold Story of the Honey Bee in the Wild; Princeton University Press: Princeton, NJ, USA, 2019. [Google Scholar]
Metz, B.N.; Tarpy, D.R. Reproductive senescence in drones of the honey bee (Apis mellifera). Insects 2019, 10, 11. [Google Scholar] [CrossRef]
Seeley, T.D. The Wisdom of the Hive. In The Social Physiology of Honey Bee Colonies; Harvard University Press: Cambridge, MA, USA, 1995. [Google Scholar]
Woods, E.F. Electronic Prediction of Swarming in Bees. Nature 1959, 184, 842–844. [Google Scholar] [CrossRef]
Ferrari, S.; Silva, M.; Guarino, M.; Berckmans, D. Monitoring of swarming sounds in bee hives for early detection of the swarming period. Comput. Electron. Agric. 2008, 64, 72–77. [Google Scholar] [CrossRef]
Robles-Guerrero, A.; Saucedo-Anaya, T.; Gonzalez-Ramirez, E.; De la Rosa-Vargas, J.I. Analysis of a multiclass classification problem by Lasso Logistic Regression and Singular Value Decomposition to identify sound patterns in queenless bee colonies. Comput. Electron. Agric. 2019, 159, 69–74. [Google Scholar] [CrossRef]
Robles-Guerrero, A.; Saucedo-Anaya, T.; Guerrero-Mendez, C.A.; Gómez-Jiménez, S.; Navarro-Solís, D.J. Comparative Study of Machine Learning Models for Bee Colony Acoustic Pattern Classification on Low Computational Resources. Sensors 2023, 23, 460. [Google Scholar] [CrossRef]
Uthoff, C.; Homsi, M.N.; von Bergen, M. Acoustic and vibration monitoring of honeybee colonies for beekeeping-relevant aspects of presence of queen bee and swarming. Comput. Electron. Agric. 2022, 205, 107589. [Google Scholar] [CrossRef]
Ruvinga, S.; Hunter, G.; Nebel, J.-C.; Duran, O. Prediction of Honeybee Swarms Using Audio Signals and Convolutional Neural Networks. In Proceedings of the Workshop on Edge AI for Smart Agriculture (EAISA 2022), Biarritz, France, 20–23 June 2022; pp. 146–154. [Google Scholar] [CrossRef]
Libal, U.; Biernacki, P. Detecting drones at an entrance to a beehive based on audio signals and autoencoder neural networks. In Proceedings of the IEEE Signal Processing Symposium (SPSympo), Karpacz, Poland, 26–28 September 2023; pp. 99–104. [Google Scholar] [CrossRef]
Biernacki, P. Dataset for Honey Bee Audio Detection [Dataset]. Zenodo. 2023. Available online: https://zenodo.org/doi/10.5281/zenodo.10359685 (accessed on 13 December 2023).
Abdul, Z.K.; Al-Talabani, A.K. Mel Frequency Cepstral Coefficient and its Applications: A Review. IEEE Access 2022, 10, 122136–122158. [Google Scholar] [CrossRef]
Soares, B.S.; Luz, J.S.; de Macêdo, V.F.; Silva, R.R.V.e; de Araújo, F.H.D.; Magalhães, D.M.V. MFCC-based descriptor for bee queen presence detection. Expert Syst. Appl. 2022, 201, 117104. [Google Scholar] [CrossRef]
Peng, R.; Ardekani, I.; Sharifzadeh, H. An Acoustic Signal Processing System for Identification of Queen-less Beehives. In Proceedings of the 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Auckland, New Zealand, 7–10 December 2020; pp. 57–63. [Google Scholar]
Terenzi, A.; Ortolani, N.; Nolasco, I.; Benetos, E.; Cecchi, S. Comparison of Feature Extraction Methods for Sound-Based Classification of Honey Bee Activity. IEEE/ACM Trans. Audio Speech Lang. Process. 2022, 30, 112–122. [Google Scholar] [CrossRef]
Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least Angle Regression. Ann. Stat. 2004, 32, 407–499. [Google Scholar]
Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. 1996, 58, 267–288. [Google Scholar] [CrossRef]
Libal, U. Feature selection for pattern recognition by LASSO and thresholding methods—A comparison. In Proceedings of the IEEE 16th International Conference on Methods & Models in Automation & Robotics (MMAR), Miedzyzdroje, Poland, 22–25 August 2011; pp. 168–173. [Google Scholar] [CrossRef]
Emmert-Streib, F.; Dehmer, M. High-Dimensional LASSO-Based Computational Regression Models: Regularization, Shrinkage, and Selection. Mach. Learn. Knowl. Extr. 2019, 1, 359–383. [Google Scholar] [CrossRef]
Alshqaq, S.S.; Abuzaid, A.H. An Efficient Method for Variable Selection Based on Diagnostic-Lasso Regression. Symmetry 2023, 15, 2155. [Google Scholar] [CrossRef]
Akaike, H. Information Theory and an Extension of the Maximum Likelihood Principle. In Selected Papers of Hirotugu Akaike; Springer Series in Statistics; Parzen, E., Tanabe, K., Kitagawa, G., Eds.; Springer: New York, NY, USA, 1998. [Google Scholar] [CrossRef]
Schwarz, G. Estimating the Dimension of a Model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
Abouzid, H.; Chakkor, O.; Reyes, O.G.; Ventura, S. Signal speech reconstruction and noise removal using convolutional denoising audioencoders with neural deep learning. Analog. Integr. Circ. Sig. Process. 2019, 100, 501–512. [Google Scholar] [CrossRef]
Faraji Niri, M.; Mafeni Mase, J.; Marco, J. Performance Evaluation of Convolutional Auto Encoders for the Reconstruction of Li-Ion Battery Electrode Microstructure. Energies 2022, 15, 4489. [Google Scholar] [CrossRef]
Saminathan, K.; Mulka, S.T.R.; Damodharan, S.; Maheswar, R.; Lorincz, J. An Artificial Neural Network Autoencoder for Insider Cyber Security Threat Detection. Future Internet 2023, 15, 373. [Google Scholar] [CrossRef]

Figure 1. Flowchart of data processing.

Figure 2. Amplitude of power spectral density for exemplary audio recordings from drone bee and worker bee classes.

Figure 3. Mel-frequency cepstral coefficient extraction diagram.

Figure 4. Regularization path by LASSO: the sequential calculation of coefficients

{\hat{β}}^{L A S S O} = (β_{1}, β_{2}, \dots, β_{p})

for the decreasing in logarithmic scale values of the regularization parameter

α

. Each color represents a particular path for

β_{i} (α)

for

i = 1, \dots, p

.

Figure 4. Regularization path by LASSO: the sequential calculation of coefficients

{\hat{β}}^{L A S S O} = (β_{1}, β_{2}, \dots, β_{p})

for the decreasing in logarithmic scale values of the regularization parameter

α

. Each color represents a particular path for

β_{i} (α)

for

i = 1, \dots, p

.

Figure 5. General schema for autoencoder neural network.

Figure 6. Information criteria: Akaike (AIC) and Bayesian (BIC), calculated for the feature sets selected by LASSO.

Figure 7. Coefficients selected by LASSO from the set MFCC + Delta + DeltaDelta for different values

α

of the regularization parameter.

Figure 7. Coefficients selected by LASSO from the set MFCC + Delta + DeltaDelta for different values

α

of the regularization parameter.

Figure 8. Classification results for different number of features selected by LASSO.

Figure 9. Classification results for 40 MFCC features, 80 MFCC+Delta features, and 120 MFCC+Delta+DeltaDelta features.

Table 1. Numbers of selected MFCC features by the LASSO algorithm and associated values of information criteria AIC and BIC for different values of regularization parameter

α

.

Table 1. Numbers of selected MFCC features by the LASSO algorithm and associated values of information criteria AIC and BIC for different values of regularization parameter

α

.

Regularization		No. of Features		No. of Features
Parameter $α$	AIC	for AIC	BIC	for BIC
0.0500	988.7	22	990.4	22
0.0300	824.2	29	826.4	29
0.0200	718.6	41	721.8	41
0.0150	650.2	49	654.0	49
0.0100	571.4	60	578.1	60
0.0065	536.2	69	542.4	69
0.0050	529.2	80	534.9	80
0.0030	527.2	93	533.6	93
0.0020	526.7	101	533.5	101
0.0006	526.6	111	534.3	111
0.0001	526.5	120	535.0	120

Table 2. Accuracy, recall, precision and F1-score of the honey bee classification for different numbers of selected MFCC features by LASSO.

No. of Features	Accuracy	Recall	Precision	F1-Score
29	88.20%	85.17%	77.50%	82.56%
41	87.03%	85.00%	74.90%	81.03%
49	85.70%	84.80%	72.14%	79.54%
60	95.08%	84.96%	98.31%	91.81%
69	95.26%	85.52%	98.35%	92.16%
80	95.26%	85.47%	98.44%	92.11%
93	91.44%	84.70%	86.34%	87.97%
101	76.10%	84.35%	56.68%	78.38%
111	83.10%	83.99%	67.39%	79.92%
120	81.15%	84.42%	63.93%	81.25%

The two cases with the highest accuracy in bold font.

Table 3. Accuracy, recall, precision and F1-score of the honey bee classification for the different feature sets—MFCC, MFCC+Delta, and MFCC+DeltaDelta—without applying any feature selection.

Feature Set	No. of Features	Accuracy	Recall	Precision	F1-Score
MFCC	40	90.56%	81.39%	86.20%	89.74%
MFCC+Delta	80	82.33%	84.15%	65.98%	80.86%
MFCC+Delta+DeltaDelta	120	81.15%	84.42%	63.93%	81.25%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Libal, U.; Biernacki, P. MFCC Selection by LASSO for Honey Bee Classification. Appl. Sci. 2024, 14, 913. https://doi.org/10.3390/app14020913

AMA Style

Libal U, Biernacki P. MFCC Selection by LASSO for Honey Bee Classification. Applied Sciences. 2024; 14(2):913. https://doi.org/10.3390/app14020913

Chicago/Turabian Style

Libal, Urszula, and Pawel Biernacki. 2024. "MFCC Selection by LASSO for Honey Bee Classification" Applied Sciences 14, no. 2: 913. https://doi.org/10.3390/app14020913

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MFCC Selection by LASSO for Honey Bee Classification

Abstract

Featured Application

Abstract

1. Introduction

1.1. Swarming

1.2. Drones’ Flights

1.3. Sounds of Honey Bees and Swarming Prediction

2. Materials and Methods

2.1. Audio Signal Parametrization: MFCC and Derivatives

2.2. Feature Selection via LASSO Regression

2.3. Information Criterion

2.4. Classification by Autoencoder Neural Networks

3. Results

3.1. Features Selected by LASSO with AIC and BIC

3.2. Classification Quality

3.3. Confusion Matrix

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI