Improving Classification Performance with Statistically Weighted Dimensions and Dimensionality Reduction

Buatoom, Uraiwan; Jamil, Muhammad Usman

doi:10.3390/app13032005

Open AccessArticle

Improving Classification Performance with Statistically Weighted Dimensions and Dimensionality Reduction

by

Uraiwan Buatoom

^1,*

and

Muhammad Usman Jamil

²

¹

Faculty of Science and Arts, Chanthaburi Campus, Burapha University, Chanthaburi 22170, Thailand

²

Department of Electrical and Computer Engineering, Assumption University, Samut Prakan 10570, Thailand

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(3), 2005; https://doi.org/10.3390/app13032005

Submission received: 26 November 2022 / Revised: 18 January 2023 / Accepted: 2 February 2023 / Published: 3 February 2023

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In image classification, various techniques have been developed to enhance the performance of principal component analysis (PCA) dimension reduction techniques with guiding weighting features to remove redundant and irrelevant features. This study proposes the statistically weighted dimension technique based on three distribution-related class behaviors; collection-class, inter-class, and intra-class to enhance the feature-extraction ability before using PCA for feature selection. The data from the statistics-weighted dimension spaces is utilized to reduce dimensionality by reducing the large index data into smaller index data using PCA. The new principal component from the weighted training part by an unlabeled dataset is constructed and then the image is classified efficiently. Additionally, the weighting direction investigates the pros and cons of promoting and demoting to determine the worst or best option utilizing the exponents of three proposed weighted scheme. The experiment is conducted using three datasets, MNIST, E-MNIST, and F-MNIST, along with three image classification algorithms, logistic Regression, KNN, and SVM (RBF). The results clearly demonstrate that the statistically weighted dimension feature can improve the conventional classification accuracy in lower dimensions with an appropriate combination of weighting nearly 3% for the best solution on dimensionality reduction by more than 50%.

Keywords:

statistically weighted dimension; feature weighting; dimension reduction; class character; collection-class; inter-class; intra-class; image classification; principal component analysis

1. Introduction

Computer vision technology uses machine learning to analyze and extract the features to recognize, identify, and derive the meaning of the digital images. Most computer vision techniques include classification algorithms that play a vital role in investigating the strengths and weaknesses of feature representation. Feature extraction and feature selection are widely used to remove redundant and unnecessary features in order to improve classification efficiency by integrating a new pixel-based classification algorithm [1]. When performing data analysis with a large number of features, it is generally advantageous to reduce the dimension of data to a lower dimension. Principal component analysis (PCA) is an effective preprocessing tool for the constructive dimensionality reduction algorithms that find the correlation between variables of features. PCA has been constructed and utilized in several fields, such as natural language processing [2], speech recognition [3], geography [4], bioinformatics [5], and computer vision [6]. PCA is widely used in the research field of image processing as the image processing work highlights the problems of both computation and memory consumption. However, Wolf et al. (2005) showed that the variable selection algorithms fail when all the variables are correlated but do well with the informative variables [7].

According to the dimension reduction scale, many researchers have introduced different utilizations of PCA with relative information when the set of images passes through various statistical methods. Puyati et al. (2008) combined modular PCA and wavelet PCA, and considered high and low variations of correlation, respectively, for the preprocessing method. Their work is investigated based on the weighting probability values of both methods to significantly improve face recognition [8]. Kumar et al. (2020) introduced preprocessing by the gray level co-occurrence Matrix (GLCM) feature extraction in combination with PCA on the selection phase of ultrasound images of kidneys for making a classification model using artificial neural network (ANN). GLCM is a statistical technique that describes texture features by combining distributions of intensities and values at distinct places relative to one another [9]. Yu et al. (2010) used probabilistic principal component analysis combined with expectation maximization (PPCA-EM) to extract the main features to estimate the missing data in 3D image analysis [10]. Hu et al. (2019) demonstrated image recognition using an SVM technique based on extracting the major features with PCA in the case of a highly self-similar digital medical picture using a fractional differential mask operator [11]. Moreover, Garg et al. (2019) obtained significant experimental results of heuristics or grid search methods of deep learning models when structured CNN design along with PCA was used [12].

On the other hand, the detection of the class levels of informativeness is also an important idea. In the past, feature-extraction methods were investigated to improve the classification with feature selection [13]. The feature extraction uses local features to represent distinctive properties of relevant information more accurately than the global features [14] because these features are selected independently based on dataset characteristics [15]. It extracts features from all the feature component sets and then uses them to transform the input data into a new set of informative features. Yumeng et al. (2020) demonstrated that the conventional dimension reduction (PCA) effect is inferior to the PCA combined with entropy [16]. Therefore, some researchers have concentrated on feature weighting to improve the feature selection methods that introduce weighting with inter-class and intra-class distance [17]. The classes applied in statistically weighted feature techniques were also investigated to improve the classification and clustering from retaining characteristics of a class issue. However, some researchers have suggested a class weighted measure by a similar distance that reflects the characteristics of classes [18]. Furthermore, both feature selection and extraction techniques are used for reducing the number of features in a dataset that has an important difference. Feature selection is simply selecting and excluding given features without changing them, and dimensionality reduction transforms features into a lower dimension. One challenge is to obtain a dimensionality reduction technique via PCA that could carry out feature extraction and feature selection without compromising image quality and losing significant features by dominant class characteristics.

Based on an interesting topic, it is debatable whether there are more effective techniques for combining class characteristics via statistically weighted features with a reduction dimension methodology for classification than simple feature weighting when utilizing directly with labeled data. In this study, we proposed improving image classification performance by exploring the utilization of collective class characteristics to establish a statistically weighted algorithm and combining this weight with PCA to enhance discrimination ability. The statistically weighted algorithm is used for the extraction of data by the distribution of class behavior, i.e., in-collection, inter-class, and intra-class. Instead of directly relying on the past knowledge of labeled data, the suggested technique in this work attempts to capture the class patterns using statistics for classification. The experiment results display that the method has good performance of the prediction task which is being investigated on three standard datasets, MNIST, E-MNIST, and F-MNIST, compared with three image classification algorithms, logistic regression, KNN, and SVM (RBF).

The paper is organized as follows: In Section 2, we discuss the motivation of the statistically analytical class approach on weighted data and explain the reason for performing the experiment on statistically weighted dimensions with a dimensionality reduction. In Section 3, we present an overview of the framework and how it applied the statistically weighted algorithm with a dimensionality reduction based on classification. In Section 4, we present a detailed performance evaluation of classification with a statistically weighted dimension and dimensionality reduction. In Section 5, we discuss the related literature and our proposed work is compared to the other published papers on the same area.

2. Literature Review

This section provides the background on dimension reduction based on PCA that optimizes the performance by using weighted variables. Several methodologies have been used to review and summarize the studies on the subject of weighted class analysis.

2.1. Weighted Variable for Guidance Dimension Reduction Based on PCA

Dimension reduction is a technique to extract enough information to identify the prediction by reducing the noise and redundant information in the data, which causes over-fitting and enhances discrimination. Different techniques are used for dimension reduction [19,20,21], however, principal component analysis (PCA) is a traditional common linear feature-extraction technique that is used to reduce the high dimensionality data into potential low-dimensionality data. The PCA technique converts the dataset which has

m \times n

dimension into a new dimension

m \times n^{'}

, where

n^{'} < n

. The PCA uses the eigenvalues to find out the corresponding eigenvectors from the covariance matrix. However, the features extracted by PCA may not retain the prior knowledge information that is obtained from the original dataset very well [22].

Nowadays, many studies focus on improving the dataset before selecting the feature with the most information by using PCA [16,22,23,24,25]. Xiao et al. (2020) used a low-distortion projection fast-Johnson–Lindenstrauss transform technique for dimensional expansion and then projected the data to a higher dimensional metric space before performing PCA [23]. Tavoli et al. (2013) experimented with feature weighting based on keyword spotting with a document image retrieval system, and then weighted the features using weighted PCA [24]. However, many algorithms have been presented to obtain knowledge by relying on the embedded information in the training set to partly guide the conventional PCA. Yumeng et al. (2020) introduced the mutual information threshold for feature screening and the weighted average value was used to improve the data centralization process. Then, the traditional PCA was improved based on entropy weight to optimize the dimensionality reduction process [16]. Liu et al. (2005) assigned the weighted vector according to their class-dependent properties by defining within-class and between-class variables, then used the genetic algorithm to decide the optimal weights [22,25].

In summary, the quality benchmark dataset affects the classification results. For further investigation, it is worth figuring out how to reconstruct the dataset with feature weighting the guide characteristics as a weighted class to improve the characteristic class information of datasets for the image classification.

2.2. Weighted Class Analysis

The weighted class analysis method can be extended to enhance the quality of PCA for image classification. This approach combines both methodologies to apply weighted values to guide PCA in selecting a subset of significant feature variables. In the past, the weighted values were surveyed in a wide range of methods to obtain the information directions to improve the machine learning methods. As a measure of weighted value, the class analysis is used to embed the hint information for stimulating weighted values in many fields. There is also a wide variety of exploration such as feature frequency [26], entropy for measuring the diversity of features [27], and variance for distribution of feature measuring [28].

The variance is a popular statistical measure to simulate weighted feature variables. In general, the inter-class (between classes) and intra-class (within a class) variance is the basic idea from the viewpoint of class analysis [29,30,31]. Hameed et al. (2021) pointed out that applying feature value followed by class analysis value seems to improve the conventional feature that is applied to “Parkinson’s disease” dataset classification. This work applied the reduction technique by increasing inter-class and reducing intra-class variance to keep the structure of class information together with the removal of co-related features due to noise [32]. In addition to exploring intra-class and inter-class data to identify the importance of variables, entropy information is used to optimize the class weighted value based on the ideas of maximization of intra-class and minimization of inter-class distance, which applied to the process of reducing data dimensions [17].

However, most existing research focuses on inter-class and intra-class data, while the exploitation of collective-class characteristics is lacking. Buatoom et al. (2020) presented a class information analysis to explore the known class information based on the distribution statistics (the sample standard deviation: s or

s d

) on text analysis. The proposed methodology describes the class characters that consists of the distribution value of terms in a collection, in a class, and among classes [18]. Overall, it remains a challenge to assign class characteristics weight on three types of term-oriented standard deviation: (1) collection-class (overview class), (2) inter-class (between classes), and (3) intra-class (within a class) to improve classification performance with statistically weighted dimensions by distribution value with PCA dimensionality reduction.

3. The Proposed Framework

The framework for dimensionality reduction and statistically weighted dimensions, which is used to enhance the classification performance, is presented in this section. After describing the idea of the completed framework and algorithms, the concepts of the statistically weighted dimension and of using the PCA method to apply statistically weighting based on the dimension reduction are proposed.

3.1. The Complete Framework and Algorithms of Proposed Improvement to Classification Performance with SWD-PCA Method

This section shows the complete framework and the algorithms of the proposed improved classification performance. The framework shown in Figure 1 consists of three main processes: (1) statistics-weighted dimension, (2) dimension reduction, and (3) classification of the training part using the labeled dataset. In order to check the performance of the proposed scheme, the statistics weight is applied to the testing part and the new principal component from the weighted training part by the unlabeled dataset is constructed and then the image can be classified most efficiently. In addition, Algorithm 1 contains the main proposed pseudo-code that explains how to integrate the three main processes. However, Algorithm 1 also uses the sub-function “

StatisticsWeightedDimension

” from the Algorithm 2 to perform the complete task. It is important to emphasize that the statistically weighted dimension is applied to the class characteristics before using the dimensionality reduction that depends on PCA.

Algorithm 1: The main proposed pseudo-code of improving classification with statistically weighted dimensions and dimensionality reduction.

1:: procedure $Classification$ ( $D_{L}$ , $D_{U}$ , $Y_{L}$ , $Y_{U}$ )
2:: Input: $D_{L} = {d l_{1}, d l_{2}, . . ., d l_{| D_{L} |}}$ , # a set of labeled documents
3:: $D_{U} = {d u_{1}, d u_{2}, . . ., d u_{| D_{U} |}}$ , # a set of unlabeled documents
4:: $Y_{L} = {y_{1}, y_{2}, y_{3}, . . ., y_{| Y_{L} |}}$ , # a set of possible labeled of labeled documents
5:: $Y_{U} = {y_{1}, y_{2}, y_{3}, . . ., y_{| Y_{U} |}}$ # a set of possible labeled of unlabeled documents
6:: Output: Accuracy of image classification
7:
8:: begin
9:: ( $X_{(D_{L})}, SW$ ) = $StatisticsWeightedDimension$ ( $D_{L}, Y_{L}$ ) # by Equations (7) and (8))
10:: ${WX}_{(D_{L})}$ = $PCA$ ( $X_{(D_{L})}, SW$ ) # by Equations (9) and (10)
11:: $M o d e l$ = $TraingClassificationModel$ ( ${WX}_{(D_{L})}, Y_{L}$ )
12:: $X_{(D_{U})}$ = $FeatureWeighted$ ( $D_{U}$ ) # unlabeled dataset for testing step using Equation (1)
13:: ${WX}_{(D_{U})}$ = $PCA$ ( $X_{(D_{U})}, SW$ ) # by Equations (9) and (10)
14:: $A C C$ = $Model$ ( ${WX}_{(D_{U})}, Y_{U}$ )
15:: end
16:: end procedure

Algorithm 2: Pseudo-code of sub functions:

StatisticsWeightedDimension

1:: procedure $DistributionWeightEncoding$ ( $X, Y_{L}$ )
2:: Input: $X_{(D_{L})} = [{\vec{x}}_{1}, {\vec{x}}_{2}, {\vec{x}}_{3}, . . ., {\vec{x}}_{| D_{L} |}]$ # a feature weighted matrix
3:: $Y_{L} = {y_{1}, y_{2}, y_{3}, . . ., y_{| Y_{L} |}}$ # a set of possible labeled of labeled documents
4:: Output: $ACSW = [a c s w_{1}, a c s w_{2}, a c s w_{3}, . . ., a c s w_{n}]$ # a intra-class weighted vector
5:: $ICSW = [i c s w_{1}, i c s d w_{2}, i c s w_{3}, . . ., i c s w_{n}]$ # a inter-class weighted vector
6:: $SDW = [s d w_{1}, s d w_{2}, s d w_{3}, . . ., s d w_{n}]$ # a collection-class weighted vector
7:
8:: begin
9:: ( $ACSW, ICSW, SDW$ ) = $DistributionMeasurement$ ( $X_{| D_{L} |}, Y_{L}$ ) # by Equations (2)–(6)
10:: end
11:: end procedure

1:: procedure $StatisticsWeightedDimension$ ( $D_{L}$ , $Y_{L}$ )
2:: Input: $D_{L} = {d l_{1}, d l_{2}, . . ., d l_{| D_{L} |}}$ , # a set of the labeled documents
3:: $Y_{L} = {y_{1}, y_{2}, y_{3}, . . ., y_{| Y_{L} |}}$ # a set of the possible labeled of the labeled documents
4:: Output: $X_{(D_{L})} = [{\vec{x}}_{1}, {\vec{x}}_{2}, {\vec{x}}_{3}, . . ., {\vec{x}}_{| D_{L} |}]$ # a feature weighted matrix
5:: $SW = [s w_{1}, s w_{2}, s w_{3}, . . ., s w_{n}]$ # a statistically weighted vector
6:
7:: begin
8:: $X_{(D_{L})}$ = $FeatureWeighted$ ( $D_{L}$ ) # Labeled dataset for training step using Equation (1)
9:: ( $ACSW, ICSW, SDW$ ) = $DistributionWeightEncoding$ ( $X_{(D_{L})}, Y_{L}$ )
10:: $SW$ = $OptimalWeighted$ ( $ACSW, ICSW, SDW$ ) # by Equations (7) and (8)
11:: end
12:: end procedure

In the first process of Figure 1, the statistic distribution measurement is used to reflect the complete characteristics of each dataset, i.e., document, class, and collection. Let

X \in ℜ^{M \times N}

be a N-dimensional feature space and M be the number of samples.

Y = [y_{i 1}, y_{i 2}, y_{i 3}, . . ., y_{M P}]

is a label vector with the length of M samples, where P is the number of possible labels and indicates whether the

i - t h

object belongs the number of samples. Let

D = {d_{1}, d_{2}, . . ., d_{| D |}}

be a set of

| D |

image documents (samples),

F = {f_{1}, f_{2}, . . ., f_{| F |}}

be a set of

| F |

possible features, and

C = {c_{1}, c_{2}, . . ., c_{| C |}}

be a set of

| C |

classes where

C_{p} = {d | d

is an image document that belong to class

C_{p}}

. Here, we assume the feature weight is computed from the pixel value

x_{m n} = x (d_{m}, f_{n})

, which in turn is processed from the image into a feature vector.

x_{m n}

denotes the feature weighted on

d_{m}

row (sample m) and

f_{n}

column (feature n) dimension, expressed as:

x_{m n} = x (d_{m}, f_{n})

(1)

The document-weighted view (data for calculating the statistical weight) shows the pixel values of the original images, represented by matrix

X

that contains the feature weight information for all documents. It is noteworthy that the document weighting represents the original pixel value of the image in each class, which is used to capture the characteristics of class features using a statistically weighted distribution. Therefore, the original pixel value is utilized when weighting the document for computing the distribution that follows class features. The class-weighted view shows the intra-class and inter-class ideas and extracts the feature-relation information based on the distribution statistic. The intra-class weight (

a c s w_{n}

) is the average of all class standard deviations (

s_{n p}

) which is calculated by the feature (

f_{n}

) among all the labeled documents in each class (

C_{p}

). The inter-class weight (

i c s w_{n}

) is the computation of the standard deviation of all class means (

x_{n p}

) and it is calculated by the feature (

f_{n}

) among all labeled documents in each class (

C_{p}

). Finally, the collection-weighted view measures the distribution of the labeled dataset that encodes the whole view of the dataset character (

s d w_{n}

). These document-weighted, class-weighted, and collection-weighted data are utilized to construct the statistics-weighted dimension as shown in Process 1 (see Figure 1).

In the second process, the data from the statistics-weighted dimension spaces is utilized to reduce dimensionality by reducing the large index data into smaller index data using PCA, which enhances the dataset’s index quality. The importance information is therefore collected and regarded to pass through the covariance matrix in order to extract the eigenvector and construct the lower dimension space. The covariance matrix is a

n \times n

symmetric matrix that has entries of the covariances associated with all possible pairs. In the third process, a classification model is designed based on the different standard models, such as logistic regression, KNN, SVM using polynomial kernels, and RBF to compare the performance by the accuracy between the conventional feature and the statistical-weighted technique.

Algorithm 1 illustrates the pseudo-code of the main procedure in which four inputs are the set of labeled images in the dataset for the training set (

D_{L}

), the unlabeled images dataset for the testing set (

D_{U}

), the possible labels of the labeled training set (

Y_{L}

), and the possible labels of the unlabeled testing set (

Y_{U}

). Both labeled and unlabeled image datasets are extracted to be a feature weight matrix (X) (line 9 and 12, Algorithm 1), and the standard deviation statistic is utilized to extract the statistics weight dimension as a vector (

SW : ACSW, ICSW, SDW

) (line 9, Algorithm 1) by the pseudo-code of sub-function “

StatisticsWeightedDimension

” and “

DistributionWeightedEncoding

” in Algorithm 2, respectively. After encoding the statistics weighted step, the PCA is used to recast a statistics feature that is calculated by the element-wise product between feature weight (

X

) and statistics weight (

SW

) (line 10, Algorithm 1) before executing the training classification model (line 1, Algorithm 1). For testing purposes, the feature weight matrix (

X

) of the unlabeled dataset also uses the statistic weight and reduces the dimensionality that adjusts the setting values derived from the training step (line 13, Algorithm 1). Finally, the entire “

M o d e l

” is compared against the accuracy measurement to determine the performance of the unlabeled dataset using statistically weighted dimensions and the dimensionality reduction technique. Algorithm 1 illustrates the pseudo-code of the main procedure,

Classification

, with statistically weighted dimensions and dimensionality reduction, which is the third process in Figure 1.

3.2. Statistically Weighted Dimensions and Optimal Weights Selection

The statistically weighted dimension is a powerful preprocessing technique used to provide variational information of the dataset characteristics by detecting class weight. This subsection describes a brief introduction to statistically weighted dimensions based on the distribution scheme by standard deviation on three types: (1) collection-class (overview class), (2) inter-class (between classes), and (3) intra-class (within a class). The statistically weighted dimension is based on distribution. Using the standard deviation concept, as discussed earlier, it can be expressed by the equations as follows.

1.: Intra-class weight:

$\begin{matrix} a c s w_{n} & = & \frac{1}{| C |} \sum_{p} (s_{n p}) \end{matrix}$

(2)

$\begin{matrix} s_{n p} & = & \sqrt{\frac{1}{| C_{p} | - 1} \sum_{d_{m} \in C_{p}} {(x_{m n} - {\bar{x}}_{n p})}^{2}} \end{matrix}$

(3)

$\begin{matrix} {\bar{x}}_{n p} & = & \frac{\sum_{d_{m} \in C_{p}} x_{m n}}{| C_{p} |} \end{matrix}$

(4)

The intra-class weight ( $a c s d w_{n}$ ) calculates the average of a class’s standard deviation ( $s d_{n p}$ ) among all classes for each feature n in each potential class (p). The variation of the feature weight among the sample documents in each class represents the class standard deviation, and ${\bar{x}}_{n p}$ represents the mean of feature n in each possible class (p). The low intra-class variation value indicates that its feature value is stable among documents in the class, implying that it could be a useful representative feature in a class to classify.
2.: Inter-class weight:

$\begin{matrix} i c s w_{n} & = & \sqrt{\frac{1}{| C | - 1} \sum_{p} {({\bar{x}}_{n p} - (\frac{1}{| C |} \sum_{p} {\bar{x}}_{n p}))}^{2}} \end{matrix}$

(5)

The inter-class weight ( $i c s d w_{n}$ ) computes the standard deviation of the feature n’s class-summation weight on the set of possible classes. The value of each class is assumed to be represented by the mean of each class ( ${\bar{x}}_{n p}$ ). The higher value of an inter-class variation may be considered as a good representative feature in a class to classify, representing that the feature value occurs with dominating differences among the classes.
3.: Collection-class weight:

$s d w_{n} = \sqrt{\frac{\sum_{p} \sum_{d_{m} \in C_{p}} {(x_{m n} - (\frac{\sum_{p} \sum_{d_{m} \in C_{p}} x_{m n}}{\sum_{p} | C_{p} |}))}^{2}}{\sum_{p} | C_{p} | - 1}}$

(6)

The collection-class weight (

s d w_{n}

) measures the occurrence variation among documents in the whole collection for each feature (n). The higher value of

s d w_{n}

means the feature weight n has a higher occurrence of variation among sample documents in the whole collection, i.e., this feature may be general background and may not be a good representative of the class to classify.

The optimal statistically weighted dimension (

W^{*}

) is chosen by exploring the best accuracy (ACC) of classification by combining feature weight on

M \times N

dimension (

X

) with the statistically weighted N dimension (

SW

) that promotes or demotes the exponent of each statistic’s weight.

SW = [s w_{n}]

is a statistically weighted vector in which

s w_{n}

is the statistical weighting of the feature n. Therefore, to determine the optimal threshold of three exponent parameters that maximize the classification accuracy in the training dataset, we select the exponent of each statistical weight and express them as follows.

\begin{matrix} s w_{n} = s d w_{n}^{i} \times a c s w_{n}^{j} \times i c s w_{n}^{k} \end{matrix}

(7)

\begin{matrix} W^{*} = \underset{w}{arg max} A C C (X ⊙ SW) \end{matrix}

(8)

where i, j, and k are the parameters for setting the exponent of

s d w_{n}

,

a c s d w_{n}

, and

i c s d w_{n}

. It is important to highlight that the positive values of parameters are used to promote, and a negative value of parameter works as a relegation of the statistically weighted factor.

3.3. Statistically Weighted Dimensions Based on PCA for Image Reconstruction

The PCA algorithm with statistically weighted dimensions is extended to improve the performance of the conventional PCA. Due to this, the eigenvalues from the covariance matrix between the random variables are used by PCA to determine the associated eigenvectors. The random variables are matrix values with no specific weight included to direct the class information. Therefore, the blind class information is intended for study in the conventional PCA. The main idea of our proposed statistically weighted dimensions scheme based on PCA (SWD-PCA) is to utilize the PCA to address the high dimensionality in the low dimension and also resolve the blind class information issues. Therefore, to resolve this issue (that might lead to misleadingly extracting the significant feature), the SWD-PCA algorithm reconstructs the feature matrix

X_{M \times N}

into

{WX}_{M \times N}

as expressed below.

\begin{matrix} {WX}_{M \times N} = {(X ⊙ SW)}_{M \times N} = {(X)}_{M \times N} {(SW)}_{M \times N} \end{matrix}

(9)

where ⊙ defines the element-wise product of matrix

X

and

SW

, having the same dimensions that produce the operands in the same dimensions. Note that in the first step of (9) we find the statistically weighted dimension matrix (

WX

) from the feature weight matrix (

X

) with the element-wise product of the class information weight matrix (

SW

).

Then, PCA is used to extract low-dimensional features by transforming a feature matrix

X

to a lower-dimensional matrix as

Z \in ℜ^{M \times R}

. The matrix is considered as a transformation matrix that can project the higher-dimensional feature matrix

{WX}_{M \times N}

into a lower-dimensional feature matrix

Z_{M \times R}

. The matrix

D_{N \times R}

is considered as a transformation matrix that can project the higher-dimensional statistically weighted dimension matrix into a lower-dimensional matrix. Matrix

D

is constructed by using the eigenvectors that keep us from choosing a subset of the principal component (R) and which are determined by ordering the eigenvalues of the covariance matrix of the statistically weighted dimension (

{WX}_{M \times N}

) as shown below.

\begin{matrix} Z_{M \times R} = {WX}_{M \times N} \times D_{N \times R} \end{matrix}

(10)

To retain only significant dimensions and reduce noise, we determine the number of components (R) according to the explained variance ratio, measured by considering the trend between contribution rate and the number of components and defined as

R \leq m i n (M, N))

.

The reconstructed statistically weighted vector selected by the main component coefficient is the eigenvector that corresponds to the first choosing eigenvalues.

3.4. Classification with Statistically Weighted Dimensions Based on PCA

This section describes the classifications formalism using statistically weighted dimensions based on PCA. Classification is a supervised machine learning method that tests the model’s ability to determine the correct label for specific input data. The model is fully trained using the training data, evaluated using the test data, and then used to make predictions on new, unused data while carrying out classification. In this work, we utilize three well-known classification algorithms for the sake of simplicity and time restraint: (1) logistic regression, (2) k-nearest neighbors (KNN), and (3) support vector machines utilizing kernel RBF (SVM (RBF)).

1.: Logistic Regression: Logistic regression is most effective supervised learning probabilistic classifier, and is an extension of a linear model. Regardless of its name, logistic regression is a classification method rather than a regression method. It is used to estimate the discrete values (yes/no, true/false, 1/0) based on a given set of independent variables. By calculating the probability that an event will occur using its logistics function, it basically evaluates the relationship between the categorical dependent variable and one or more independent factors. If there are too many dependent variable(s), then the logistic regression analysis based on the binomial probability theory is referred to as binomial logistic regression. It provides probabilistic values that range from 0 to 1. The fundamental equation is as follows.

$p r o b (z_{1}) = \frac{1}{1 + 1^{- (β_{0} + β_{1} z_{1})}}$

(11)

where $p r o b (z_{1})$ is the output of the logistic function, $β_{0}$ is the y-intercept, $β_{1}$ is the slope, and $z_{1}$ is the independent variable, which is derived by statistically weighted dimensions based on PCA.
2.: K-Nearest Neighbors (KNN): Neighbors-based classification is a kind of instance-based learning often known as non-generalizing learning. The classification process is determined by a simple majority vote of each point’s nearest neighbors: a query point is assigned to the data class with the most representation among its nearest neighbors. The basic nearest neighbors categorization employs uniform weights, which means that the value assigned to a query point is determined by a simple majority vote of the nearest neighbors. It is desirable to weigh the neighbors such that the closer neighbors contribute more to the fit and the default value of weights to each neighbor is equal to “distance”. The method for distance is the Euclidean distance measurement, where the equation is as below.

$d i s (z_{1}, z_{2}) = \sqrt{\sum_{i = 1}^{| N |} {(z_{1 i} - z_{2 i})}^{2}}$

(12)

where $d i s (z_{1}, z_{2})$ is the distance output from the Euclidean function. $z_{1}$ and $z_{2}$ are two points which are derived by statistically weighted dimensions based on PCA. $z_{1 i}$ and $z_{2 i}$ represent Euclidean vectors that start from the origin of the space (initial point), and N is a n-space of data.
3.: Support Vector Machines Utilizing Kernel RBF (SVM (RBF)): SVM is a type of supervised learning algorithm used for classification, regression, and outlier detection. The primary objective of the SVM method is to locate a perfect hyperplane with a large margin that may divide a space into discrete classes in n dimensions. For support vector machines, the kernel is the key to improve the learning effectiveness. The mathematical function for the Gaussian or radial basis function (RBF) kernel is as below

$K (z_{1}, z_{2}) = e x p (\frac{| | z_{1} - z_{2} {| |}^{2}}{2 σ^{2}})$

(13)

where $K (z_{1}, z_{2})$ is the output of kernel function, $σ^{2}$ is the variance and represents the hyper parameter, and $| | z_{1} - z_{2} | |$ is the Euclidean ( $L_{2}$ -norm) distance between two points $z_{1}$ and $z_{2}$ .

4. Experimental Data Evaluation

In this section, the complete explanation of all datasets, the experiment setting, and the evaluation for a measurement of the proposed improving classification performance scheme with statistically weighted dimensions and dimensionality reduction is described.

Data Sets

In this paper, the complete characteristics of the three experimental image datasets (EMNIST MNIST, EMNIST Letters, and EMNIST Fashion [33,34]) are used for assessment as shown in Table 1.

Table 1 shows the characteristics of the experimental datasets that have the general information of the dataset as well as the distribution information. First, we will discuss the general characteristics of the experimental datasets. The first dataset, “EMNIST MNIST” denoted as MNIST [33], is a collection of handwritten digits (images), which has a total number of 70,000 images and 10 classes. The second dataset, “EMNIST Letters” denoted as E-MNIST [33], consists of handwritten letters (images) which has a total number of 145,600 images and is a balance of uppercase and lowercase letters combined them into a single 26-class task. Finally, the last dataset, “EMNIST Fashion” denoted as F-MNIST [34], consists of Zalando article images and has a total number of 70,000 images and contains 10 classes. Additionally, the class sizes are remarkably consistent across all datasets that include classes when counted by sharing the overall class.

5. Experimental Setting

The experimental planning to validate the classification performance with a statistically weighted dimension scheme based on the PCA technique is elaborated into the three main experiments, see Section 6.1–Section 6.3. The first experiment exhibits classification performance without PCA with a statistically weighted dimension in which single and multiple statistically weighted dimension variables are addressed on both sides of a multiplier (for promoting) or a divider (for demoting). To discover the worst or best solution using exponents, the effectiveness of a statistically weighted method is investigated utilizing three factors (sdw, acsdw, icsdw) and twelve distinct exponents (varying between −3.0 and 3.0, with step size of 0.5). The second experiment compares statistically weighted PCA and conventional PCA on various percentage variances of component size, in which PCA-based dimension reduction performance by classification with statistically weighted dimensions is discussed. The number of factors influencing classification quality is also taken into account. The third experiment compares the statistically weighted dimensions and dimensionality reduction techniques with classification to evaluate the various statistically weighted data using the PCA approach. A confusion matrix with 95% control on PCA variance also shows the effectiveness of the classification task. To evaluate the performance of the classification with the proposed method, the effectiveness of classification measurement uses the ratio of the total number of image documents assigned with their correct classes

(T_{i})

in all classes

(P)

, and compares it to the total number of documents in the testing dataset (M), expressed as below.

Accuracy = A C C = \frac{\sum_{i = 1}^{| P |} T_{i}}{M}

(14)

6. Experiment Results

6.1. Classification Performance with Statistically weighted Dimension without PCA

This section examines the direction that results from using a statistically weighted dimension scheme in both a single-weighted feature (Section 6.1.1) and a combination-weighted feature (Section 6.1.2).

6.1.1. Single Type of Statistically Weighted Dimension

In this paper, the first experiment explores the impact of distinct approaches (

ACSW

,

ICSW

, and

SDW

) at various exponent levels on the image classification performance as shown in Table 2. As a measurement of the quality of the classification performance, Table 2 displays the accuracy values by incorporating two sides of each approach with a statistically weighted dimension factor including (−) for demoting (varies between

- 3.0

and

- 0.5

, with a step-size of

0.5

), and (+) for promoting (varies between

+ 0.5

and

+ 3.0

, with a step-size of

0.5

) as compared with a traditionally weighted 0 BASE-LINE. Each statistically weighted dimension is investigated on three datasets, (1) MNIST, (2) EMNIST, and (3) F-MNIST, on the three most-used standard classifications, i.e., logistic regression (LR), k-nearest neighbor (KNN), and support vector machine (SVM) with RBF kernel (SVM (RBF)). The last column in the Table 2, named as AVERAGE, shows the average performance values obtained from the three classification methods of each dataset. The bold font is used in the table to draw the attention to the winner of each method. The underlined blue text is used to show the best method with an exponent for each classification algorithm for each dataset.

First, the experiment demonstrates how the classification methods compare in terms of performance, demonstrating that the SVM with RBF kernel methodology achieves a greater level of performance (accurary) than KNN or the logistic regression method. SVM used a method known as the kernel function inside it, and supports the complicated problems by converting data into high dimensions in order to avoid the necessary assumptions. The results demonstrate that all three datasets had the highest classification accuracy using the SVM (RBF) technique, with varied values of 97.99% for MNIST, 90.75% for E-MNIST, and 89.57% for F-MNIST. Secondly, in most cases of the applied single statistically weighted method it seems that the

SDW

method helps to enhance the classification better than other approaches. It also implies that the

SDW

can reflect the class characteristic in a weighted class overview well. However, in some instances,

ACSW

, and

ICSW

both enhance the performance more than the baseline, and this will be investigated in the next experiment by combining them. The information demonstrates that a single statistically weighted technique successfully captures the distinctive features of three datasets. The group of the greatest performances listed in Panel 3 that use collection class weighting (

SDW

) demonstrate the high performance classification of MNIST and E-MNIST. On the other hand, Panel 1 shows the highest values for F-MISNT when intra-class weighted (

ACSW

) data are considered. Third, the effectiveness of a statistically weighted approach effectively works on both the promoted and demoted sides, depending on the class characteristics of each dataset. For F-MNIST dataset’s class features, the

ACDW

performs well on the demote side (−), whereas the

SDW

succeeds for MNIST and E-MNIST on the promoted side (+). However, impacted image classification occurs similarly to text mining [18]; the

ICDW

approach commonly works well on the promoted side (+).

Lastly, it can be inferred from the results that the various exponent values for different methods show that the proposed method works well in the range between

- 1

and

+ 1

. It is important to highlight that for

ACSW

, when the exponent values are chosen over

- 1.5

, the accuracy performance rapidly decreases, nearly between 70–80%. As an experiment, it can be concluded that the statistically weighted algorithm is used to increase the performance of classification method on exponent ranges from

- 1

to

+ 1

with a step size of value

0.5

.

6.1.2. Multiple-Type Statistically Weighted Dimension

In the first experiment, the impact of a single statistically weighted dimension characteristic for both promoted and demoted sides is examined. This section of the experiment is performed on how well three different statistically weighted dimension feature approaches work together without PCA, and also along with PCA. Additionally, each method factor is investigated only for 125 (5 × 5 × 5) combinations in which the exponents vary between

- 1.0

to

+ 1.0

with a step size of

0.5

. This setting is made based on the exponent’s survey from the first experiment in which the exponent values vary from

- 3

to

+ 3

with a step size of

0.5

(see Table 2). Table 3 displays the best 20 (or best 10), and the worst 20 (or worst 10) numbers in which three classification algorithms, (1) logistic regression, (2) KNN, and (3) SVM (RBF), are surveyed. For example, in the first row of Panel 1.1, Panel A (best), the number 9(5) indicates that nine of the best 20 weightings and five of the best 10 weights have the exponent value of

ACSW

=

- 1

. However, Panel 1.1 displays the numbers for the best 20 (best 10 in parenthesis), and Panel B displays the worst 20 (worst 10 in parenthesis). The experiment is performed on three standard MNIST/E-MNIST/F-MNIST datasets using three different classification algorithms. The classification performance result values are shown in Table 3; which shows the numbers for the best 20 (best 10) and the worst 20 (worst 10). The experiment compares the performance between the statistically weighted dimension feature without PCA and the statistically weighted dimension feature based on PCA (contribution rate = 0.99).

In Table 3 Panel A, most of the best 20 numbers (best 10 in parenthesis) depict that the

ICSW

method from our proposed combination of statistically weighted dimension features has 0 value on the demoting side with an exponent equal to

- 1

,

- 0.5

. Similarly, most of the best numbers on the promote side have an exponent value equal to

+ 1

,

+ 0.5

. It can be implied that the

ICSW

method for our proposed combination cases works well on the promoting side. In contrast,

ACSW

, and

SDW

have greater performance on the demoting side rather than on the promoting side, in which the most of best counting numbers appear on the demoting side. Similarly, the results of both statistically weighted dimension feature techniques working together without and with PCA indicate that the statistically weighted dimension feature is beneficial for guiding image classification. As a result, the best or worst numbers are chosen based on side dominance rather than typical weighted information (exponent equal to 0). It also can be concluded that the counting number of best or worst are selected either on the promoting or demoting side, and their values are more dominant than the traditional weighting information (exponent equal to 0). Furthermore, the effectiveness of the three image classification algorithms with the proposed scheme on three datasets show that KNN, logistic regression, and SVM (RBF) come in this order of effectiveness. For example, in both Panel 2.1 and Panel 2.2 for the best 20 (best 10 in parentheses), the performance of the KNN method shows that

ICSW

is on the promoting side, while

ACSW

and

SDW

are on the demoting side. Therefore, these results clearly reflect that KNN uses a degree of exponents higher than logistic regression and SVM (RBF). One possible cause may come from the effect of the proposed scheme; the image classification algorithm has high performance based on classifying the similarity concept. Furthermore, it can be concluded that the proposed approach on three image classification algorithms has high efficiency by using the high exponents for the MNIST, E-MNIST, and F-MNIST datasets, respectively. However, in the case of logistic regression for E-MNIST, it is concluded that it has an impact with a high exponent value rather than other two datasets. Finally, the statistically weighted dimension feature without PCA uses the component higher than the statistically weighted dimension feature with PCA, in which the degree of exponent has a small gap of values. This is because the proposed work performs well due to the exact shape of the image dataset.

6.2. Classification Performance with Statistically Weighted Dimension and Dimensionality Reduction (PCA)

This section aims to observe the further analysis of potential combinations of the statistically weighted dimension method with PCA. The proposed scheme is in comparison with the following techniques: (1) BASE-LINE, (2) optimal statistically weighted dimension combination by selecting the best performing combination of each dataset and image classification, where we consider the information of the experiment for multiple factors of statistically weighted dimensions, (3) BASE-LINE with PCA, and (4) the optimal statistically weighted algorithm with PCA (our proposed idea). The contribution rate for PCA in this experiment is set at 0.99. Three algorithms of image classification, (1) logistic regression, (2) KNN, and (3) SVM (RBF), are investigated on three datasets, MNIST, E-MNIST, and F-MNIST. Table 4 displays the results of (1) the accuracy of image classification, (2) the number of components, and (3) the number of exponents for

ACSW

,

ICSW

, and

SDW

, respectively. For example, in Table 4 and the dataset MNIST section on line one, the values 92.55 (784) (0,0,0) describe the result of the MNIST dataset with the logistic regression algorithm, which has the accuracy value of 92.55. This result of the experiment was not run with PCA. As a result, the component remains equal to the original value of 784. Additionally, it does not apply or use the statistically weighted algorithm due to the weighting feature which contains

ACSW

= 0,

ICSW

= 0, and

SDW

= 0.

Firstly, the observation of the proposed statistically weighted dimension with the PCA scheme classifies the image more effectively than BASE-LINE. Table 4 clearly shows that the combinations of weighting are all superior to the baseline. Especially in our proposed idea, we suppose the numbers that are derived from counting the superior weighting combinations in the experiment (4): ((2) + PCA) optimal statistically weighted work with PCA (our proposed idea)) compare with the baseline on 125 different weighting combinations. This demonstrates the impact of three datasets (MNIST, E-MNIST, and F-MNIST) on each classification method. The number of superior statistically weighted combinations includes the logistic regression algorithms, which vary by 35 for MNIST, 10 for E-MNIST, and 61 for F-MNIST; the KNN algorithm, which varies by 51 for MNIST, 30 for E-MNIST, and 41 for F-MNIST; and the SVM (RBF) algorithm, which varies by 27 for MNIST, 2 for E-MNIST, and 21 for F-MNIST. The best combination of statistically weighted dimension features without PCA for logistic regression, KNN, and SVM (RBF) are all superior to the BASE-LINE with an average gap of 0.60% (varying by 0.17% of MNIST, 1.03% of E-MNIST, and 0.60% of F-MNIST), 1.40% (varying by 0.41% of MNIST, 2.95% of E-MNIST, and 0.83% of F-MNIST), and 0.27% (varying by0.10 % of MNIST, 0.24% of E-MNIST, and 0.47% of F-MNIST), respectively.

Furthermore, the best combination of statistically weighted dimension features with PCA for logistic regression, KNN, and SVM (RBF) are all superior to BASE-LINE with an average gap of 0.48% (varying by 0.08% of MNIST, 0.92% of E-MNIST, and 0.43% of F-MNIST), 1.48% (varying by 0.44% of MNIST, 2.84% of E-MNIST, and 1.17% of F-MNIST), and 0.57% (varying by 0.05% of MNIST, 0.84% of E-MNIST, and 0.81% of F-MNIST), respectively. However, the demonstrated results also show that the PCA can reduce the number of components or dimensionality by more than 50%, with the average value decreasing from BASE-LINE for (3, see Table 4) PCA with BASE-LINE weighting, and (4, see Table 4) PCA with optimal statistically weighted dimension at 55.74% (varying from 35.07% for SVM(RBF) on F-MNIST to 70.41% for KNN on E-MNIST), and 52.42% (varying from 34.82% for SVM (RBF) on F-MNIST to 70.53% for SVM (RBF) on MNIST). These results demonstrate that using PCA is more accurate than not using PCA. It appears to choose the feature based on the most important characteristics, which indicate/reflect the quality of the data and enhance the efficiency of image classification. Moreover, the statistically weighted dimension feature can cooperate with PCA for feature selection by enhancing the effectiveness of the identifier in the feature. One more important observation is all the best PCA results are obtained by using weighting values, which are in the same direction as the experiments in Table 4. It can be concluded that

ICSW

performs good weighting in the promoting direction, while most of

ACSW

and

SDW

perform good weighting in the demoting direction.

Finally, the results that are obtained from the dataset used for feature selection with PCA clearly illustrate that the whole trend is in the same or a similar direction. However, in some cases, it is necessary to increase the value of the exponent with a larger lead than the case without PCA. For example, in panel MNIST, KNN Section (2, 4, see Table 4), the accuracy value is increased 0.031% by reducing the component 65.59%, which is guided by raising

ACSW

and

SDW

with one step (i.e., −0.5).

6.3. Comparative Analysis of Various PCA Set Sizes

This section explores the effect of PCA on various contribution rates on a comparative analysis of statistically weighted dimension features, along with three classification algorithms; (1) logistic regression, (2) KNN, and (3) SVM (RBF). To verify the effectiveness of PCA with statistically weighted dimension features, an experiment is conduct on three datasets: MNIST, E-MNIST, and F-MNIST. The statistically weighted exponent values are determined in the same manner as in Table 4. To investigate the effect of contribution rate, the test rate is set to 0.5% to 1.0% with a step size of 0.05%, and 0.99 for PCA in the other general experiment setting of this paper. Figure 2a,c,e illustrates the performance of classification accuracy, and Figure 2b,d,f depicts the number of components in various contribution rates. The legends represent the six types of result that are obtained after applying the different algorithms without PCA or along with PCA.

The legends provide complete information about the datasets (MNIST, E-MNIST, and F-MNIST), classification algorithms (logistic regression (LR), k-nearest neighbour (KNN), and support vector machine (SVM) with RBF kernel), and the methods for weighting (PCA: BASE-LINE weighting with PCA, and statistically weighted dimension with PCA SWD-PCA). For example, the notation “MNIST KNN PCA” indicates that this experiment is performed on the test set of the MNIST dataset that is classified by KNN using BASE-LINE weighting with PCA. The classification results show that all test sets in statistically weighted dimensions with PCA (SWD PCA) give a contribution rate of PCA of around 0.90% to 0.95%, which is effective for high performance and may minimize the number of components greater than them fluctuating from 75% to 90%. The classification results demonstrate the similar performance of PCA methods to the simulation studies in test set SWD PCA that slightly increased/improved the classification rate over the standard of BASE-LINE. The results fully indicate that SVM outperforms than other algorithm.

7. Discussion

This section describes statistically weighted features that have been investigated and utilized to guide feature selection based on class distribution characteristics, as well as related studies to improve image classification using lower dimension technique (PCA). Most developing PCA approaches employed relative information as a guideline to conduct feature extraction before starting the dimensional reduction procedure. Nevertheless, no research has been carried out on feature extraction as a class characteristic for feature weighting-based dimensional reduction in image classification. In the past, pixel values were used as a feature weight to determine feature components in the reduced dimension procedure. However, some recent work has straightforwardly applied relevant information in the form of statistics such as probability [8], entropy [16], and distribution [9], due to the fact that these statistics are based on the entire collection and are not affected by the inter- and intra-class properties. To address this limitation, some works [15,17,18] in the feature selection field only focus on class-based statistics to reflect the class characteristics during classification. Our proposed improved image classification method uses statistically weighted dimensions as distribution class information to enhance the dimension reduction where the feature selection is PCA. It is noteworthy that this improvement in classification with statistically weighted dimensions on dimensionality reduction is quite significant. The class information affects the image classification process, as shown in Table 4. The statistically weighted dimension with PCA (SWD-PCA) is used in three image classifications, (1) logistic regression, (2) KNN, and (3) SVM (RBF). The best combination of SWD-PCA works well on KNN, which improves nearly 3% (varying 0.44% of MNIST, 2.84% of E-MNIST, and 1.17% of F-MNIST). The best SWD-PCA for the remaining two classification algorithms have improved by around 1% for logistic regression, varying by 0.08% for MNIST, 0.92% for E-MNIST, and 0.43% for F-MNIST, and for SVM (RBF) varying by 0.05% of MNIST, 0.84% for E-MNIST, and 0.81% for F-MNIST. The optimum combination decreased the number of components by more than 50%, with the average value decreasing from BASE-LINE for PCA with BASE-LINE weighting (3, see Table 4), and PCA with optimal statistically weighted dimension (4, see Table 4) for 55.74% (varying for 35.07% for SVM (RBF) on F-MNIST to 70.41% for KNN on E-MNIST), and 52.42% (varying from 34.82% for SVM (RBF) on F-MNIST to 70.53% for SVM (RBF) on MNIST). Moreover, the exponent direction for the SWD-PCA combination is similar to that observed in the text mining area [18], where most results show that ICSW performs well on the promoted side (+), while ACSW and SDW perform well on the degraded side (−). Figure 2 depicts the results that all SWD-PCA methods are superior to PCA types when the contribution rates are varied. This figure especially indicates that all classification techniques that combine statistical weighting with PCA (SWD-PCA) significantly enhance the performance over PCA. The high performance contributes at a rate of roughly 90% to 95%, with the remaining components ranging between 100 and 200 out of 784 (reducing the dimension around 75% to 87%).

8. Conclusions

This study indicates that PCA can be used to reduce dimensionality by improving the accuracy of image classification with statistically weighted dimension features. The class weight that is used to provide variational information about the dataset’s attributes is established using the statistically weighted dimensions. This idea, based on the distribution scheme by standard deviation for three types, is proposed in this study. These types include (1) collection-class (overview class) (

SDW

), (2) inter-class (between classes) (

ICSW

), and (3) intra-class (within a class) (

ACSW

). Three algorithms of image classification, (1) logistic regression, (2) KNN, and (3) SVM (RBF) are investigated on three datasets; MNIST, E-MNIST, and F-MNIST. The results fully indicate that our scheme (SWD-PCA) outperforms the conventional BASE-LINE algorithm by

ICSW

on the promoting side (+) and

ACSW

and

SDW

on the demoting side (−). As a result of the experiment, it is possible to conclude that the statistically weighted algorithm should be employed to improve the performance of the classification technique on exponent ranges from

- 1

to

+ 1

. Finally, it can be concluded that, as compared with the conventional BASE-LINE image classification algorithm, the classification with statistically weighted dimensions and dimensionality reduction improves logistic regression, KNN, and SVM (RBF) with the enhanced accuracy rate of 0.48%, 1.48%, and 0.57%, respecticely, by using PCA to reduce the number of components or dimensionality by more than 50%. It is imperative to point out that the proposed technique is applied to three different types of datasets and obtains the most accurate results. In the future, we intend to apply this methodology to diverse standard datasets, including text, signal, and medical datasets, to examine its efficacy and efficiency. It is also important to investigate the efficacy and efficiency of this strategy when dimensionality reduction such as factor analysis (FA), independent component analysis (ICA), linear discriminant analysis (LDA), and t-distributed stochastic neighbor embedding (t-SNE) are employed. We also intend to compare the effectiveness of our technique to that of other statistically weighted methods and examine how distribution might enhance their performance.

Author Contributions

Conceptualization, U.B.; methodology, U.B.; software, validation, U.B.; formal analysis, U.B. and M.U.J.; investigation, resource, and data cu ration, U.B. and M.U.J.; writing—original draft preparation, U.B.; writing—review and editing, U.B. and M.U.J.; funding acquisition, U.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research is financially supported under the Faculty of Science and Arts, Burapha University, Chanthaburi Campus, Thailand Research Fund.

Institutional Review Board Statement

The study did not require ethical approval because it did not involve human or animals, and the trial dataset is already legally accessible to the public.

Informed Consent Statement

Not applicable.

Data Availability Statement

The research datasets used in this study are available for MNIST at https://www.kaggle.com/datasets/oddrationale/mnist-in-csv, E-MNIST at https://github.com/aurelienduarte/emnist/tree/master/gzip, and F-MNIST at https://www.kaggle.com/datasets/zalando-research/fashionmnist (accessed on 25 November 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

Xing, H.; Chen, B.; Feng, Y.; Ni, Y.; Hou, D.; Wang, X.; Kong, Y. Mapping irrigated, rainfed and paddy croplands from time-series sentinel-2 images by integrating pixel-based classification and image segmentation on google earth engine. Geocarto Int. 2022, 1–20. [Google Scholar] [CrossRef]
Drikvandi, R.; Lawal, O. Sparse principal component analysis for natural language processing. Ann. Data Sci. 2020, 10, 25–41. [Google Scholar] [CrossRef]
Gupta, D.; Bansal, P.; Choudhary, K. The state of the art of feature extraction techniques in speech recognition. In Speech and Language Processing for Human-Machine Communications; Springer: Berlin/Heidelberg, Germany, 2018; pp. 195–207. [Google Scholar]
Liu, Y.; Durlofsky, L.J. 3D cnn-pca: A deep-learning-based parameterization for complex geomodels. Comput. Geosci. 2021, 148, 104676. [Google Scholar] [CrossRef]
He, C.; Liu, Q.; Li, H.; Wang, H. Multimodal medical image fusion based on ihs and pca. Procedia Eng. 2010, 7, 280–285. [Google Scholar] [CrossRef]
Kang, X.; Xiang, X.; Li, S.; Benediktsson, J.A. Pca-based edge-preserving features for hyperspectral image classification. IEEE Trans. Geosci. Remote. Sens. 2017, 55, 7140–7151. [Google Scholar] [CrossRef]
Wolf, L.; Bileschi, S. Combining variable selection with dimensionality reduction. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–26 June 2005; IEEE: Piscataway, NJ, USA, 2005; Volume 2, pp. 801–806. [Google Scholar]
Puyati, W.; Walairacht, A. Efficiency improvement for unconstrained face recognition by weightening probability values of modular pca and wavelet pca. In Proceedings of the 2008 10th International Conference on Advanced Communication Technology, Phoenix Park, Republic of Korea, 17–20 February 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 1449–1453. [Google Scholar]
Priyanka; Kumar, D. Feature extraction and selection of kidney ultrasound images using glcm and pca. Procedia Comput. Sci. 2020, 167, 1722–1731. [Google Scholar] [CrossRef]
Yu, L.; Snapp, R.R.; Ruiz, T.; Radermacher, M. Probabilistic principal component analysis with expectation maximization (ppca-em) facilitates volume classification and estimates the missing data. J. Struct. 2010, 171, 18–30. [Google Scholar] [CrossRef]
Hu, L.; Cui, J. Digital image recognition based on fractional-order-pca-svm coupling algorithm. Measurement 2019, 145, 150–159. [Google Scholar] [CrossRef]
Garg, I.; Panda, P.; Roy, K. A low effort approach to structured cnn design using pca. IEEE Access 2019, 8, 1347–1360. [Google Scholar] [CrossRef]
Shah, F.P.; Patel, V. A review on feature selection and feature extraction for text classification. In Proceedings of the 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai, India, 23–25 March 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 2264–2268. [Google Scholar]
Ting, G.; Moydin, K.; Hamdulla, A. An overview of feature extraction methods for handwritten image retrieval. In Proceedings of the 2018 3rd International Conference on Smart City and Systems Engineering (ICSCSE), Xiamen, China, 29–30 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 840–843. [Google Scholar]
Xing, H.; Chen, B.; Lu, M. A sub-seasonal crop information identification framework for crop rotation mapping in smallholder farming areas with time series sentinel-2 imagery. Remote. Sens. 2022, 14, 6280. [Google Scholar] [CrossRef]
Yumeng, C.; Yinglan, F. Research on pca data dimension reduction algorithm based on entropy weight method. In Proceedings of the 2020 2nd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), Osaka, Japan, 23–25 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 392–396. [Google Scholar]
Zhang, S.; Chen, X.; Li, P.; Cai, Q. Data dimensionality reduction method combining intra-class and inter-class distance. In Proceedings of the 2019 International Conference on Artificial Intelligence and Advanced Manufacturing, Dublin, Ireland, 17–19 October 2019; pp. 1–7. [Google Scholar]
Buatoom, U.; Kongprawechnon, W.; Theeramunkong, T. Document clustering using k-means with term weighting as similarity-based constraints. Symmetry 2020, 12, 967. [Google Scholar] [CrossRef]
Hernandez, W.; Mendez, A.; Göksel, T. Application of principal component analysis to image compression. In Statistics-Growing Data Sets and Growing Demand for Statistics; IntechOpen: Rijeka, Croatia, 2018. [Google Scholar]
Nandi, D.; Ashour, A.S.; Samanta, S.; Chakraborty, S.; Salem, M.A.; Dey, N. Principal component analysis in medical image processing: A study. Int. J. Image Min. 2015, 1, 65–86. [Google Scholar] [CrossRef]
Li, X.; Zhang, L.; You, J. Locally weighted discriminant analysis for hyperspectral image classification. Remote. Sens. 2019, 11, 109. [Google Scholar] [CrossRef]
Liu, N.; Wang, H. Weighted principal component extraction with genetic algorithms. Appl. Soft Comput. 2012, 12, 961–974. [Google Scholar] [CrossRef]
Xiao, C.; Shao, W.; Xiao, R. Toward more efficient wmsn data search combined fjlt dimension expansion with pca dimension reduction. IEEE Access 2020, 8, 104139–104147. [Google Scholar] [CrossRef]
Tavoli, R.; Kozegar, E.; Shojafar, M.; Soleimani, H.; Pooranian, Z. Weighted pca for improving document image retrieval system based on keyword spotting accuracy. In Proceedings of the 2013 36th International Conference on Telecommunications and Signal Processing (TSP), Rome, Italy, 2–4 July 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 773–777. [Google Scholar]
Liu, N.; Wang, H. Feature extraction using evolutionary weighted principal component analysis. In Proceedings of the 2005 IEEE International Conference on Systems, Man and Cybernetics, Waikoloa, HI, USA, 10–12 October 2005; IEEE: Piscataway, NJ, USA, 2005; Volume 1, pp. 346–350. [Google Scholar]
Sreeram, V.; Sahlan, S. Improved results on frequency-weighted balanced truncation and error bounds. Int. J. Robust Nonlinear Control 2012, 22, 1195–1211. [Google Scholar] [CrossRef]
Wu, C.; Chen, Y. Adaptive entropy weighted picture fuzzy clustering algorithm with spatial information for image segmentation. Appl. Soft Comput. 2020, 86, 105888. [Google Scholar] [CrossRef]
Buatoom, U.; Kongprawechnon, W.; Theeramunkong, T. Improving seeded k-means clustering with deviation-and entropy-based term weightings. IEICE Trans. Inf. Syst. 2020, 103, 748–758. [Google Scholar] [CrossRef]
Pilarczyk, R.; Skarbek, W. On intra-class variance for deep learning of classifiers. arXiv 2019, arXiv:1901.11186. [Google Scholar] [CrossRef]
Chen, Y.; Hu, H. Facial expression recognition by inter-class relational learning. IEEE Access 2019, 7, 94106–94117. [Google Scholar] [CrossRef]
Venkataramanan, A.; Laviale, M.; Figus, C.; Usseglio-Polatera, P.; Pradalier, C. Tackling inter-class similarity and intra-class variance for microscopic image-based classification. In Proceedings of the International Conference on Computer Vision Systems, Virtual, 22–24 September 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 93–103. [Google Scholar]
Hameed, Z.; Rehman, W.U.; Khan, W.; Ullah, N.; Albogamy, F.R. Weighted hybrid feature reduction embedded with ensemble learning for speech data of parkinson’s disease. Mathematics 2021, 9, 3172. [Google Scholar] [CrossRef]
Cohen, G.; Afshar, S.; Tapson, J.; Schaik, A.V. Emnist: Extending mnist to handwritten letters. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 2921–2926. [Google Scholar]
Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar]

Figure 1. The complete framework of improving classification performance with statistically weighted dimension and dimensionality reduction.

Figure 2. Class-based measurement using accuracy and the number of components. Here, the circle marks indicate the performance when the contribution rate is used. (a) Accuracy of MNIST. (b) The component of MNIST. (c) Accuracy of E-MNIST. (d) The component of E-MNIST. (e) Accuracy of F-MNIST. (f) The component of F-MNIST.

Table 1. Characteristics of experimental datasets.

Dataset Name	EMNIST MNIST	EMNIST Letters	EMNIST Fashion
General characteristics
Abbreviation	MNIST	E-MNIST	F-MNIST
Type	Digits	Letters	Fashion Images
No. of images	70,000	145,600	70,000
No. of classes	10	26	10
No. doc./class	7000 each	5600 each	7000 each
	(approximate)
No. of distinct features	784	784	784
Class size (total features)
Avg.	577.70	682.31	776.60
Min.	551.00	660.00	729.00
Max.	611.00	703.00	784.00
SD.	17.30	10.75	16.24

Table 2. Feature-weighting comparison on demoted and promoted sides.

Method	Exponents	Method Accuracy (%)
Method	Exponents	Logistic Regression (MNIST/ E-MNIST/ F-MNIST)	KNN (MNIST/ E-MNIST/ F-MNIST)	SVM (RBF) (MNIST/ E-MNIST/ F-MNIST)	Average (MNIST/ E-MNIST/ F-MNIST)
BASE-LINE	0	92.55/ 71.59/ 85.44	96.65/ 83.82/ 85.73	97.92/ 90.67/ 89.21	95.71/ 82.03/ 86.80
Panel 1: Statistically weighted dimension feature: intra-class weight
ACSW (−)	−3	14.52/ 21.21/ 30.09	78.15/ 41.32/ 78.91	11.35/ 3.85/ 10.23	34.68/ 22.13/ 39.75
	−2.5	18.63/ 20.38/ 46.06	81.00/ 46.14/ 81.86	11.38/ 3.87/ 11.20	37.01/ 23.47/ 46.38
	−2	76.32/ 27.11/ 78.25	85.10/ 57.00/ 83.91	11.88/ 3.86/ 73.63	57.77/ 29.33/ 78.60
	−1.5	92.08/ 69.20/ 85.71	89.31/ 71.80/ 85.50	80.64/ 10.55/ 86.68	87.35/ 50.52/ 85.97
	−1	92.55/ 71.42/ 85.67	93.34/ 80.89/ 86.34	95.68/ 86.59/ 89.25	93.86/ 79.64/ 87.09
	−0.5	92.42/ 71.45/ 85.45	95.82/ 83.66/ 85.92	97.66/ 90.74/ 89.57	95.30/ 81.95/ 86.98
ACSW (+)	0.5	92.35/ 71.46/ 85.21	96.69/ 83.72/ 85.23	97.92/ 90.77/ 89.12	95.66/ 81.99/ 86.52
	1	92.14/ 71.05/ 85.11	96.56/ 83.25/ 84.77	97.84/ 90.66/ 88.89	95.52/ 81.66/ 86.26
	1.5	91.65/ 70.84/ 84.96	96.41/ 82.85/ 84.30	97.83/ 90.36/ 88.78	95.30/ 81.35/ 86.02
	2	91.55/ 70.50/ 84.68	96.16/ 82.33/ 83.68	97.76/ 90.20/ 88.64	95.16/ 81.01/ 85.67
	2.5	91.27/ 70.06/ 84.20	95.91/ 81.80/ 83.22	97.70/ 90.05/ 88.58	94.96/ 80.64/ 85.34
	3	91.06/ 69.73/ 83.83	95.61/ 81.30/ 82.75	97.63/ 89.86/ 88.25	94.77/ 80.30/ 84.95
Panel 2: Statistically weighted dimension feature: inter-class weight
ICSW (−)	-3	28.10/ 3.85/ 36.84	70.22/ 32.90/ 65.87	11.36/ 3.85/ 10.06	36.56/ 13.54/ 37.59
	−2	39.05/ 17.13/ 30.37	78.87/ 42.52/ 74.36	11.37/ 3.85/ 10.13	43.10/ 21.17/ 38.29
	−2.5	55.16/ 20.7/ 30.99	74.31/ 37.02/ 69.11	11.42/ 3.86/ 10.19	46.97/ 20.53/ 36.77
	−1.5	62.08/ 31.26/ 59.16	84.23/ 54.24/ 80.10	11.95/ 4.00/ 11.75	52.76/ 29.84/ 50.34
	−1	91.98/ 71.34/ 84.79	90.27/ 73.03/ 84.08	90.02/ 71.60/ 85.80	90.76/ 71.99/ 84.89
	−0.5	92.49/ 71.39/ 85.23	94.87/ 82.60/ 85.45	97.19/ 90.25/ 89.16	94.85/ 81.42/ 86.62
ICSW (+)	0.5	92.34/ 71.42/ 84.93	97.02/ 83.84/ 85.79	97.98/ 90.60/ 89.07	95.78/ 81.96/ 86.60
	1	91.76/ 71.03/ 84.62	96.86/ 83.45/ 85.52	97.99/ 90.25/ 88.83	95.54/ 81.58/ 86.33
	1.5	91.28/ 70.48/ 84.40	96.62/ 82.85/ 85.26	97.71/ 89.90/ 88.48	95.21/ 81.08/ 86.05
	2	91.21/ 70.28/ 83.97	96.22/ 82.15/ 84.92	97.35/ 89.56/ 88.12	94.93/ 80.67/ 85.67
	2.5	91.05/ 69.80/ 83.82	95.79/ 81.30/ 84.58	97.18/ 89.05/ 87.89	94.68/ 80.05/ 85.43
	3	90.96/ 69.41/ 83.65	95.34/ 80.37/ 84.28	97.05/ 88.50/ 87.50	94.45/ 79.43/ 85.15
Panel 3: Statistically weighted dimension feature: collection-class weight
SDW (−)	−3	14.43/ 19.45/ 13.29	80.75/ 42.76/ 78.06	11.37/ 3.85/ 10.12	35.52/ 22.02/ 33.83
	−2.5	38.01/ 19.85/ 53.95	84.13/ 47.37/ 81.02	11.40/ 3.85/ 10.59	44.52/ 23.70/ 48.52
	−2	86.89/ 36.42/ 78.38	87.58/ 58.36/ 83.25	14.06/ 3.87/ 70.70	62.85/ 32.89/ 77.45
	−1.5	92.43/ 71.71/ 85.97	91.17/ 72.94/ 85.04	91.97/ 64.40/ 87.22	91.86/ 69.69/ 86.08
	−1	92.71/ 71.61/ 85.55	94.26/ 81.40/ 85.86	96.65/ 88.70/ 89.17	94.54/ 80.57/ 86.86
	−0.5	92.61/ 71.39/ 85.39	96.00/ 83.61/ 85.74	97.75/ 90.74/ 89.48	95.46/ 81.92/ 86.87
SDW (+)	0.5	92.41/ 71.50/ 85.14	96.80/ 83.87/ 85.52	97.99/ 90.75/ 89.13	95.74/ 82.04/ 86.60
	1	92.14/ 71.20/ 85.14	96.81/ 83.71/ 85.38	97.89/ 90.65/ 88.98	95.62/ 81.86/ 86.50
	1.5	91.68/ 70.82/ 84.84	96.72/ 83.45/ 85.27	97.89/ 90.45/ 88.90	95.43/ 81.58/ 86.34
	2	91.69/ 70.63/ 84.57	96.61/ 83.06/ 85.23	97.88/ 90.32/ 88.72	95.40/ 81.34/ 86.18
	2.5	91.43/ 70.37/ 84.69	96.54/ 82.70/ 85.22	97.85/ 90.09/ 88.58	95.28/ 81.06/ 86.17
	3	91.20/ 70.16/ 84.46	96.50/ 82.37/ 85.18	97.81/ 89.94/ 88.43	95.17/ 80.83/ 86.03

Table 3. The statistically-weighted dimension feature distribution factors (

ACSW, ICSW

,

SDW

) analysis on varied exponents on three image classification algorithms with Logistic Regression, KNN, and SVM (RBF) and three datasets (MNIST/ EMNIST/ F-MNIST) for the number of feature weightings (combinations) of Best-20 (Best-10) and Worst-20 (Worst-10).

Table 3. The statistically-weighted dimension feature distribution factors (

ACSW, ICSW

,

SDW

) analysis on varied exponents on three image classification algorithms with Logistic Regression, KNN, and SVM (RBF) and three datasets (MNIST/ EMNIST/ F-MNIST) for the number of feature weightings (combinations) of Best-20 (Best-10) and Worst-20 (Worst-10).

Method	Exponents of Method (MNIST/ E-MNIST/ F-MNIST)					Total
Method	−1	−0.5	0	+0.5	+1	Total
Panel 1: Logistic Regression algorithm
Panel 1.1: Logistic Regression algorithm with statistically weighted dimensions feature
Panel A (Best):
$ACSW$	9 (5)/8 (6)/6 (6)	4 (3)/6 (3)/5 (2)	5 (2)/6 (1)/3 (1)	2 (0)/0 (0)/5 (1)	0 (0)/0 (0)/1 (0)	20 (10)/20 (10)/20 (10)
$ICSW$	0 (0)/0 (0)/1 (0)	3 (0)/1 (0)/9 (4)	8 (4)/4 (0)/6 (3)	6 (3)/9 (5)/3 (2)	3 (3)/6 (5)/1 (1)	20 (10)/20 (10)/20 (10)
$SDW$	7 (5)/7 (5)/8 (3)	8 (4)/5 (2)/4 (3)	3 (0)/4 (3)/3 (3)	2 (1)/2 (0)/4 (1)	0 (0)/2 (0)/1 (0)	20 (10)/20 (10)/20 (10)
Panel B (Worst):
$ACSW$	8 (2)/8 (2)/8 (2)	5 (2)/5 (2)/5 (2)	3 (2)/3 (2)/4 (3)	2 (2)/1 (1)/1 (1)	2 (2)/3 (3)/2 (2)	20 (10)/20 (10)/20 (10)
$ICSW$	10 (3)/10 (4)/10 (3)	6 (3)/6 (3)/6 (3)	1 (1)/1 (0)/1 (1)	0 (0)/1 (1)/0 (0)	3 (3)/2 (2)/3 (3)	20 (10)/20 (10)/20 (10)
$SDW$	8 (3)/8 (2)/8 (3)	5 (2)/5 (2)/5 (2)	3 (2)/3 (2)/3 (2)	2 (1)/2 (2)/3 (2)	2 (2)/2 (2)/1 (1)	20 (10)/20 (10)/20 (10)
Panel 1.2: Logistic Regression algorithm with statistically weighted dimensions feature based on PCA
Panel A (Best):
$ACSW$	7 (5)/6 (4)/8 (4)	7 (3)/7 (3)/6 (4)	3 (2)/5 (2)/3 (2)	3 (0)/2 (1)/2 (0)	0 (0)/0 (0)/1 (0)	20 (10)/20 (10)/20 (10)
$ICSW$	2 (1)/0 (0)/2 (0)	6 (1)/2 (2)/5 (3)	6 (3)/7 (4)/6 (2)	3 (3)/6 (2)/5 (3)	3 (2)/5 (2)/2 (2)	20 (10)/20 (10)/20 (10)
$SDW$	7 (4)/7 (4)/7 (4)	5 (4)/6 (3)/3 (2)	3 (1)/4 (2)/4 (3)	2 (0)/2 (1)/3 (1)	3 (1)/1 (0)/3 (0)	20 (10)/20 (10)/20 (10)
Panel B (Worst):
$ACSW$	10 (4)/10 (4)/9 (4)	6 (3)/6 (3)/6 (3)	3 (2)/3 (2)/4 (3)	1 (1)/1 (1)/1 (0)	0 (0)/0 (0)/0 (0)	20 (10)/20 (10)/20 (10)
$ICSW$	12 (5)/11 (5)/13 (6)	6 (3)/6 (3)/6 (3)	2 (2)/3 (2)/1 (1)	0 (0)/0 (0)/0 (0)	0 (0)/0 (0)/0 (0)	20 (10)/20 (10)/20 (10)
$SDW$	8 (3)/9 (3)/8 (2)	6 (3)/6 (3)/5 (2)	3 (2)/3 (2)/4 (3)	2 (1)/1 (1)/2 (2)	1 (1)/1 (1)/1 (1)	20 (10)/20 (10)/20 (10)
Panel 2: KNN algorithm
Panel 2.1: KNN algorithm with statistically weighted dimensions feature
Panel A (Best):
$ACSW$	4 (3)/4 (2)/11 (6)	6 (3)/6 (3)/7 (4)	5 (2)/5 (3)/2 (0)	3 (1)/4 (2)/0 (0)	2 (1)/1 (0)/0 (0)	20 (10)/20 (10)/20 (10)
$ICSW$	0 (0)/0 (0)/0 (0)	0 (0)/0 (0)/1 (1)	0 (0)/6 (2)/5 (2)	11 (7)/9 (7)/9 (4)	9 (3)/5 (1)/5 (3)	20 (10)/20 (10)/20 (10)
$SDW$	5 (2)/4 (2)/6 (3)	3 (2)/3 (2)/4 (4)	5 (3)/6 (2)/5 (2)	4 (2)/4 (2)/3 (0)	3 (1)/3 (2)/2 (1)	20 (10)/20 (10)/20 (10)
Panel B (Worst):
$ACSW$	10 (4)/10 (4)/7 (2)	6 (3)/6 (3)/4 (1)	3 (2)/3 (2)/4 (3)	1 (1)/1 (1)/2 (1)	0 (0)/0 (0)/3 (3)	20 (10)/20 (10)/20 (10)
$ICSW$	12 (5)/11 (5)/15 (8)	6 (3)/6 (3)/4 (1)	2 (2)/3 (2)/1 (1)	0 (0)/0 (0)/0 (0)	0 (0)/0 (0)/0 (0)	20 (10)/20 (10)/20 (10)
$SDW$	8 (3)/9 (3)/9 (3)	6 (3)/6 (3)/6 (3)	3 (2)/3 (2)/3 (2)	2 (1)/1 (1)/1 (1)	1 (1)/1 (1)/1 (1)	20 (10)/20 (10)/20 (10)
Panel 2.2: KNN algorithm with statistically weighted dimensions feature based on PCA
Panel A (Best):
$ACSW$	3 (3)/4 (3)/12 (6)	5 (4)/7 (3)/6 (4)	5 (2)/5 (3)/2 (0)	5 (1)/3 (1)/0 (0)	2 (0)/1 (0)/0 (0)	20 (10)/20 (10)/20 (10)
$ICSW$	0 (0)/0 (0)/0 (0)	0 (0)/0 (0)/2 (0)	0 (0)/5 (1)/5 (2)	12 (7)/10 (7)/9 (5)	8 (3)/5 (2)/4 (3)	20 (10)/20 (10)/20 (10)
$SDW$	4 (1)/4 (2)/6 (3)	6 (2)/4 (2)/5 (4)	5 (3)/4 (2)/3 (2)	3 (2)/5 (2)/3 (1)	2 (2)/3 (2)/3 (0)	20 (10)/20 (10)/20 (10)
Panel B (Worst):
$ACSW$	10 (5)/10 (4)/8 (3)	6 (3)/6 (3)/5 (2)	3 (2)/3 (2)/3 (2)	1 (0)/1 (1)/1 (0)	0 (0)/0 (0)/3 (3)	20 (10)/20 (10)/20 (10)
$ICSW$	12 (5)/12 (6)/13 (6)	6 (3)/6 (3)/6 (3)	2 (2)/1 (0)/1 (1)	0 (0)/1 (1)/0 (0)	0 (0)/0 (0)/0 (0)	20 (10)/20 (10)/20 (10)
$SDW$	8 (2)/9 (3)/9 (3)	6 (3)/5 (2)/6 (3)	3 (2)/3 (2)/3 (2)	2 (2)/2 (2)/1 (1)	1 (1)/1 (1)/1 (1)	20 (10)/20 (10)/20 (10)
Panel 3: SVM (RBF) algorithm
Panel 3.1: SVM (RBF) algorithm with statistically weighted dimensions feature
Panel A (Best):
$ACSW$	1 (0)/2 (2)/7 (4)	5 (2)/3 (2)/8 (4)	6 (4)/5 (3)/3 (2)	5 (4)/6 (2)/2 (0)	3 (0)/4 (1)/0 (0)	20 (10)/20 (10)/20 (10)
$ICSW$	0 (0)/0 (0)/0 (0)	1 (0)/7 (4)/5 (2)	3 (2)/10 (6)/8 (4)	9 (4)/3 (0)/6 (4)	7 (4)/0 (0)/1 (0)	20 (10)/20 (10)/20 (10)
$SDW$	5 (3)/4 (1)/4 (2)	5 (3)/4 (1)/4 (3)	5 (2)/4 (2)/5 (1)	2 (1)/4 (2)/4 (1)	3 (1)/4 (4)/3 (3)	20 (10)/20 (10)/20 (10)
Panel B (Worst):
$ACSW$	10 (4)/10 (4)/9 (3)	6 (3)/6 (3)/6 (3)	3 (2)/3 (2)/4 (3)	1 (1)/1 (1)/1 (1)	0 (0)/0 (0)/0 (0)	20 (10)/20 (10)/20 (10)
$ICSW$	12 (6)/11 (4)/13 (6)	6 (3)/6 (3)/6 (3)	2 (1)/3 (3)/1 (1)	0 (0)/0 (0)/0 (0)	0 (0)/0 (0)/0 (0)	20 (10)/20 (10)/20 (10)
$SDW$	8 (2)/9 (4)/8 (3)	6 (3)/6 (3)/5 (2)	3 (2)/3 (2)/4 (3)	2 (2)/1 (0)/2 (1)	1 (1)/1 (1)/1 (1)	20 (10)/20 (10)/20 (10)
Panel 3.2: SVM (RBF) algorithm with statistically weighted dimensions feature based on PCA
Panel A (Best):
$ACSW$	1 (0)/ 1 (1)/ 5 (3)	3 (3)/ 4 (3)/ 4 (3)	5 (1)/ 6 (3)/ 4 (3)	6 (4)/ 6 (2)/ 4 (1)	5 (2)/ 3 (1)/ 3 (0)	20 (10)/ 20 (10)/ 20 (10)
$ICSW$	0 (0)/ 0 (0)/ 3 (0)	0 (0)/ 6 (2)/ 5 (2)	5 (3)/ 10 (6)/ 6 (4)	11 (6)/ 4 (2)/ 5 (3)	4 (1)/ 0 (0)/ 1 (1)	20 (10)/ 20 (10)/ 20 (10)
$SDW$	5 (3)/ 4 (2)/ 5 (4)	5 (2)/ 4 (2)/ 5 (2)	4 (2)/ 4 (1)/ 3 (1)	2 (1)/ 3 (2)/ 3 (2)	4 (2)/ 5 (3)/ 4 (1)	20 (10)/ 20 (10)/ 20 (10)
Panel B (Worst):
$ACSW$	10 (4)/ 10 (6)/ 9 (4)	6 (3)/ 6 (2)/ 6 (2)	3 (2)/ 3 (2)/ 4 (3)	1 (1)/ 1 (0)/ 1 (1)	0 (0)/ 0 (0)/ 0 (0)	20 (10)/ 20 (10)/ 20 (10)
$ICSW$	12 (6)/ 11 (8)/ 13 (6)	6 (3)/ 6 (1)/ 6 (3)	2 (1)/ 3 (1)/ 1 (1)	0 (0)/ 0 (0)/ 0 (0)	0 (0)/ 0 (0)/ 0 (0)	20 (10)/ 20 (10)/ 20 (10)
$SDW$	8 (2)/ 9 (3)/ 8 (3)	6 (3)/ 6 (3)/ 5 (2)	3 (2)/ 3 (2)/ 4 (2)	2 (2)/ 1 (1)/ 2 (2)	1 (1)/ 1 (1)/ 1 (1)	20 (10)/ 20 (10)/ 20 (10)

Table 4. The comparison accuracy of statistically weighted dimension feature based on PCA analysis. The value before the parentheses is accuracy of classification, the value in first parentheses is the number of components, and the value in last parentheses is the number of exponents for

ACSW, ICSW,

and

SDW

.

Table 4. The comparison accuracy of statistically weighted dimension feature based on PCA analysis. The value before the parentheses is accuracy of classification, the value in first parentheses is the number of components, and the value in last parentheses is the number of exponents for

ACSW, ICSW,

and

SDW

.

Dataset	Features	Methods
(NO.)		LogisticRegression	KNN	SVM (RBF)
		ACC (comp.) (acsw, icsw, sdw)	ACC (comp.) (acsw, icsw, sdw)	ACC (comp.) (acsw, icsw, sdw)
Mnist
(1)	BASE−LINE	92.55 (784) (0, 0, 0)	96.65 (784) (0, 0, 0)	97.92 (784) (0, 0, 0)
(2)	Optimal SWD	92.71 (784) (−1, 1, −1)	97.05 (784) (−0.5, 0.5, 0.5)	98.02 (784) (−0.5, 0.5, −1)
(3)	(1) + PCA	92.13 (331) (0, 0, 0)	96.60 (331) (0, 0, 0)	98.27 (246) (0, 0, 0)
(4)	(2) + PCA	92.63 (432) (−1, 1, −1)	97.08 (269) (−1, 0.5, 1)	98.32 (231) (−1, 0.5, −0.5)
E−MNIST
(1)	BASE−LINE	71.59 (784) (0, 0, 0)	83.82 (784) (0, 0, 0)	90.67 (784) (0, 0, 0)
(2)	Optimal SWD	72.33 (784) (−1, 0.5, −1)	86.29 (784) (0, 0.5, −0.5)	90.89 (784) (0.5, 0, −0.5)
(3)	(1) + PCA	71.97 (278) (0, 0, 0)	86.00 (232) (0, 0, 0)	91.45 (278) (0, 0, 0)
(4)	(2) + PCA	72.44 (416) (−0.5, 0.5,−1)	86.45 (247) (0, 1, −1)	91.45 (279) (−0.5, 0, 0.5)
F−MNIST
(1)	BASE−LINE	85.44 (784) (0, 0, 0)	85.73 (784) (0, 0, 0)	89.21 (784) (0, 0, 0)
(2)	Optimal SWD	85.95 (784) (−0.5, 0, −1)	86.45 (784) (−1, 0.5, −0.5)	89.63 (784) (−1, 0, 0.5)
(3)	(1) + PCA	85.10 (459) (0, 0, 0)	85.83 (459) (0, 0, 0)	89.68 (509) (0, 0, 0)
(4)	(2) + PCA	85.81 (465) (−1, 0.5, −1)	86.73 (507) (−1, 0.5, −0.5)	89.93 (511) (−0.5, 0, 0)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Buatoom, U.; Jamil, M.U. Improving Classification Performance with Statistically Weighted Dimensions and Dimensionality Reduction. Appl. Sci. 2023, 13, 2005. https://doi.org/10.3390/app13032005

AMA Style

Buatoom U, Jamil MU. Improving Classification Performance with Statistically Weighted Dimensions and Dimensionality Reduction. Applied Sciences. 2023; 13(3):2005. https://doi.org/10.3390/app13032005

Chicago/Turabian Style

Buatoom, Uraiwan, and Muhammad Usman Jamil. 2023. "Improving Classification Performance with Statistically Weighted Dimensions and Dimensionality Reduction" Applied Sciences 13, no. 3: 2005. https://doi.org/10.3390/app13032005

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improving Classification Performance with Statistically Weighted Dimensions and Dimensionality Reduction

Abstract

1. Introduction

2. Literature Review

2.1. Weighted Variable for Guidance Dimension Reduction Based on PCA

2.2. Weighted Class Analysis

3. The Proposed Framework

3.1. The Complete Framework and Algorithms of Proposed Improvement to Classification Performance with SWD-PCA Method

3.2. Statistically Weighted Dimensions and Optimal Weights Selection

3.3. Statistically Weighted Dimensions Based on PCA for Image Reconstruction

3.4. Classification with Statistically Weighted Dimensions Based on PCA

4. Experimental Data Evaluation

Data Sets

5. Experimental Setting

6. Experiment Results

6.1. Classification Performance with Statistically weighted Dimension without PCA

6.1.1. Single Type of Statistically Weighted Dimension

6.1.2. Multiple-Type Statistically Weighted Dimension

6.2. Classification Performance with Statistically Weighted Dimension and Dimensionality Reduction (PCA)

6.3. Comparative Analysis of Various PCA Set Sizes

7. Discussion

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI