1. Introduction
The image descriptor provides information about the image by extracting / determining features such as shape or color. Image descriptors have two different types: dense / sparse. Dense descriptors are the approaches that extract features from the image in a pixel-by-pixel. One of the most common dense descriptors is the Local Binary Pattern (LBP). Scale Invariant Feature Transform (SIFT) and Speeded up Robust Features (SURF) are examples of instances in which sparse descriptors evaluate each pixel in the image and extract the models [
1].
Automatic programming is the process which the machine generates program code automatically [
2]. GP, the most well-known automatic programming method, was developed by Koza [
3]. GP is an extension of the Genetic Algorithm (GA) and the basic steps for the GP algorithm are similar to the GA steps. ABCP is the recently proposed automatic programming method based on Artificial Bee Colony (ABC) algorithm [
4]. In this paper, the ABCP image descriptor was first proposed and the success of the extracted models in the texture classification problem was evaluated.
In recent years, automatic programming methods have been used more frequently in image processing problems. Tackett [
5] used GP to classify feature vectors derived from images into categories. A domain adaptive learning method based on Multi-Objective Genetic Programming (MOGP) was proposed to automatically generate field-adaptive global property descriptors for image classification in [
6]. GP was suggested for feature detection in [
7]. The method separated the image into the sub-regions, extracted the Speeded up Robust Features (SURF) points and achieved successful classification results using the Support Vector Machine (SVM) classifier. Iqbal et al. proposed an image classification method using transfer learning and GP [
8]. The GP trees are extracted by the fragments of the transfer learning improved classification performance producing more useful initial population. In [
9], they introduced new GP method that automatically differentiated the regions in the image and sorted them out by extracting Histograms of Oriented Gradients (HOG). All stages of the method were optimized with wide search space. Al-Sahaf et al. proposed a Two-Tier GP (Two-Tier GP, Two-Tier GP-line, and Two-Tier GP-mix) with three different variations [
10]. Instead of using them as input values, they defined special functions that can convert line pixel values into a single numeric value, and in the other tier, classification was made by using the outputs of special functions. Two Tier GP-mix was considered the most successful method because of the flexibility of the window range.
Comparative studies of ABC with Particle Swarm Optimization, GA and other evolutionary computational algorithms have shown that ABC has better performance in terms of achieving local and global optimum results in engineering problems [
11]. ABC is an algorithm that is successfully applied in image processing, especially in image clustering [
12], image registration [
13], pattern recognition [
14], image segmentation [
15,
16,
17,
18,
19], and image classification [
20]. ABC algorithm was used in the diagnosis of computed tomography images in [
20]. The images are classified with k Nearest Neighbor (k-NN) and SVM classifiers by selecting features with ABC. The ABC-SVM hybrid is highly successful. The ABCP algorithm has been proposed as an ABC-based method that improves functions on symbolic regression [
4]. This is the first paper that observes the performance of the ABCP algorithm in texture classification and extracts mathematical models.
Nowadays, due to its simplicity of calculation and its robustness against light changes, LBP-based descriptors have been extensively researched due to its importance in various fields such as image classification problem, pattern recognition and computer vision [
21,
22,
23,
24,
25]. [
21] was proposed a GP-based LBP image descriptor using one instance per class. The method was shown to be more successful against five different methods with / without GP-based. Ojala et al. presented a uniform LBP (uniform LBP) approach in rotation-sensitive, simple and gray-scale variances [
22]. Sinha et al. proposed the Wigner-based LBP identifier (WLBP, Wigner-Based Local Binary Patterns), which uses the Wigner distribution together with LBP [
23]. In experiments using different classifiers, the method performed better than conventional LBP. [
24] was presented a two-part Feature Based Local Binary Pattern (FB-LBP) approach. In the first part, traditional LBP is used, while in the other part the mean and variance of the magnitude vector is used. Liu et al. proposed a novel local texture descriptor, generalizing the well-known LBP approach. Four LBP-like descriptors, two local intensity-based Central Intensity LBP (CI-LBP) and Neighbors Intensity LBP (NI-LBP), and two local difference-based descriptors Radial Difference LBP (RD-LBP) and Angular Difference LBP(AD-LBP), were presented to extract complementary texture information of local spatial patterns [
1]. The descriptors had the same structure as the standard LBP and not require training and parameter adjustment. In the study conducted in three different texture, the descriptors showed better performance in gray-scale and rotation-sensitive pattern classification compared to traditional LBP.
A plethora of neural networks have been presented image processing [
25,
26,
27]. Zhang et al. introduced a two-stage approach to the use of pixel based neural networks trained by back propagation and GA [
26]. The network is trained on samples, which cut out of from large pictures. In the first stage, weights of networks are adjusted GA and in the second stage they tested the method on three object detection problems. The results show that the back-propagation algorithm has stronger generalization ability than GA. In [
27] is proposed two phase GP method compared neural network approach [
25,
26] on three object detection problems. The experimental results show that GP has better performance neural network in terms of object detection accuracy.
The paper is organized as follows: Background, which includes subsections of Local Binary Pattern and GP-descriptor are introduced in
Section 2.
Section 3 explains in detail general procedure of algorithm, ABCP, suggesting ABCP-descriptor and fitness function.
Section 4 provides detailed information on the experimental design, including datasets and parameters. The performance analysis, which includes subsections overall results and discussion of the results from the extracted models are explained in
Section 5. The paper is finalized with Conclusions in
Section 6.
3. The Proposed Method
This section provides an overview of the overall operation of the ABCP-based image descriptor, the program structure, the terminal and the function set, and the evaluation of the program.
3.1. General Procedure of Algorithm
Each dataset consists of different texture classes, and each texture class consists of multiple instances of small sizes. The data sets are divided into two instances randomly selected as 50% training and 50% test. From the training set, two random instances are selected from each class. The algorithm feeds ABCP using 2 instances from each class in the training phase. By using the fitness function, ABCP solutions are improved and when the stopping criteria is achieved, the ABCP model with the best fitness value is extracted. For each instance in the test set, feature vectors (histograms) are obtained using the best model. Histograms are classified by simple and rapid method 1-Nearest Neighbor (1-NN). The performance of the model is evaluated by the classification success. Details of the steps of the algorithm are presented in the following sections. The steps of the descriptor are given in
Figure 4.
3.2. Artificial Bee Colony Programming
ABCP is a high-level automatic programming method based on ABC algorithm [
4,
31,
32]. The steps of the ABCP are similar to the ABC algorithm. The two fundamental differences between these algorithms are the representation of the solution structure and the information sharing mechanism that enables the development of solutions. In ABC, the positions of the food sources, i.e. solutions, are carried out with fixed size arrays and displays the values found by the algorithm for the predetermined variables as in GA. In the ABCP, the positions of food sources are expressed in tree structure that is composed of different combinations of terminals and functions. The smallest unit of trees is called a node. Nodes are selected from the specially defined terminal set (variables such as x, y, variables and constants) and the function set (arithmetic operators, logical functions, mathematical functions). Trees representing the solutions are created with the combination of the nodes. The representation of a solution tree for ABCP is the shown in
Figure 5. The mathematical model of the tree in
Figure 5 is as in Equation (2) where
x and
y are defined independent variable and
f(x) is dependent variable. The equation of the tree is obtained by combining the terminals and functions in the nodes from the leaves to the root.
The complexity of the solution trees is calculated by Equation (3) in proportion to the number of nodes and the depth of trees.
The flow diagram for the ABCP method is given in
Figure 6. As in the GP, solutions at the initial phase in ABCP can be produced with “full”, “grow” and “ramped half and half” methods [
3]. The quality of each solution tree is determined by considering the fitness function, which is specifically determined to each problem.
Since solutions are represented by fixed-size arrays in ABC, the candidate solution generation cannot be used directly in ABCP. In order to produce the candidate solution (
) in the ABCP where solutions are represented tree structures have variable depth and different numbers of nodes, the crossover operator in GP [
3] has been adapted to the neighborhood research process on ABC.
In the ABCP algorithm, there three different types of bees: employed bees, onlooker bees and scout bees. Employed bees leave the hives and they have a specific food source in their memory. When they return to the hive, they share information about the food sources with onlooker bees. The onlooker bees decide which source they will go to base on the amount of the nectar of the source shared by the employed bees. During the scout bee phase, it is checked to see whether all food sources are exhausted. If the food source is exhausted, the source is abandoned. The bees formally used with an abandoned source turn into scout bees and search randomly for a new source. Unlike other bees, scout bees find new food sources without sharing information. The exhausted of food resources controls a parameter called “limit”. For each source, the number of improving trials is kept, and in each cycle, it is checked to see whether the number of trials exceeds the “limit” parameter.
When generating the candidate solution
, the neighbor node solution
(
), taken from a different solution in the population is randomly selected as a mechanism considering the probability of
. The node
is randomly selected from the neighbor solution and determines what information to share in the current solution. Similarly, a function node is selected in the probability of
(set to 0.9) and a terminal in the probability of (1−
) is selected from the current solution
. The candidate solution
is generated by replacing the nodes of the current solution node
and the neighbor solution node
. This sharing mechanism is shown in
Figure 7.
Figure 7a,b are node
representing the current solution and a neighbor node
taken from the tree respectively,
Figure 7c shows neighboring information and the generated candidate solution is given in
Figure 7d. After the candidate solution is generated, a greedy selection process is applied between the node
expressing the current solution and the candidate solution
. The candidate solution is evaluated, and greedy selection is used for each employed bee.
3.3. ABCP-Descriptor Program Representation
ABCP-descriptor is an ABCP-based image descriptor that generates models using raw pixel values. The descriptor presents the pixel values in a certain size of sliding windows. The pixel values are used inputs of a feature vector.
Figure 8 shows representation of the feature vector. In the example in
Figure 8, window size 5 × 5 is selected. The window has a value of 5 × 5 = 25 pixels. The first pixel is labelled with
P0 and is labeled up to
P24 in total. Pixels are sequenced from left to right and feature vector is generated. Therefore, the inputs of the ABCP are (
P0, P1, …, P24) in feature vector.
The ABCP function set consists of simple arithmetic operators {+, -, *, /} and the
code root node. The divide (/) function in the function set is a protected version where the divisor value is equal to 0, it is later revised to 1, otherwise, normal division is performed. The
code node creates binary code using the inputs in row vectors. The number of children of the
code node determines the length and range of the generated code. For example, if the number of children is 5, the length of the binary code is 5 bits and the program can produce different values up to 2
5 = 32 (0.1, …, 31). The representation of the ABCP-descriptor is given in
Figure 9.
As shown in
Figure 9, the sliding window is selected as 5 * 5 and has 25 inputs. Since the code node has 3 children, the resulting feature vector can produce 2
3=8 different values. If the values obtained from the branches of the code node are greater than 0, the binary code is converted into 1 and if it is less than 0 the binary code is converted into 0. In the program representation, the binary code is obtained as (011)
2. The code is converted to decimal and the value of the partition in the feature vector (histogram) is increased by 1. For example, a histogram of the model is obtained by moving the sliding window over the instance. In the process of ABCP-descriptor, 2 instances are selected for each class,
c is defined class number and
2 *c histogram is obtained. The test instances are classified by taking into account the distances between the histograms obtained in the training instances and the histograms of the test instances generated with the best model.
3.4. Fitness Function
In this paper, only a few random instances are selected from the training set and the classification success is evaluated only after the training of the instances. Therefore, it is aimed to obtain more information from the instances. The classification accuracy is not used as a fitness function alone in order to avoid models over-fitting training instances. Instead, the fitness function that evaluates the distance along with the classification accuracy. To make a reliable and fair comparison, same parameter values are chosen as in [
29] and the fitness function (objective function) is adapted as defined in Equation (4).
The classification accuracy is obtained by the ratio of the number of correctly classified instances to the total number of instances. Accuracy is defined Equation (5).
Ncorrect is the number of correctly classified instances and
Ntotal is the total number of instances. Distance is the function that calculates the distance between classes and the distance within classes. Since the logical function range is in the range [–5, 5], the range of [–1, 1] has been adjusted by adapting the function. The representation of the logical function is given in
Figure 10.
The logical function shown in
Figure 10a is defined Equation (6);
Figure 10b is defined Equation (7).
Adapted version of the distance function according to the logical function (Equation (7)) is shown in Equation (8).
Db (between distances) calculates the average distance between classes, and
Dw (within Distances) calculates the average distance within classes.
Db and
Dw, respectively are defined in Equations (9) and (10).
where
the set of instances is selected in the training set,
is the feature vector of each instance, and
is the class label.
;
c and
n respectively define the total number of classes and the number of instances per class. The total number of instances in the training set is
z (
c*n);
xα is all instances of the class in
Str. The distance between vectors with two normalized equal number of elements is shown in Equation (11).
where
ui and
vi, respectively are defined
i. element is
u and
v vectors. If the divisor of Equation (11) is 0, the protected function returns 0 as a result. The classification accuracy (Equation (5)) and the distance (Equation (8)) take the value 1 in the best case and 0 in the worst case. Therefore, the fitness function (Equation (4)) for the automatic programming method can be defined as a minimization problem.
In summary, the following steps are followed to extract the feature vector from the image descriptor.
Step 1. Instances are converted to feature vectors of the previously defined window size (for example 5*5). (
Figure 8)
Step 2. Leaf nodes are fed the root code node and the binary status of the branches is checked by checking the negativity of the branches.
Step 3. By converting the binary code to the decimal, the corresponding value of the number is increased by 1 in the histogram.
Step 4. Using histograms of instances.
Calculation of distance between vectors (
Figure 11)
Calculation of distance within classes (
Dw) and distance between classes (
Db) (
Figure 12)
The fitness values of the trees (individuals) are calculated.
Pseudo-Code 1 calculates the distance between vectors (Equation (11)), shown in
Figure 11.
Db, Dw and
accuracy are calculated according to pseudo-code 2 is shown in
Figure 12.
ABCP-descriptor is expected to improve the trees according to their fitness values. When stopping criteria are provided, the best individual is classified in test data.