1. Introduction
A fingerprint found at a crime scene is important evidence in a criminal investigation. Fingerprints are collected and compared against an existing database of fingerprint images in the police database. Comparisons are made by finding minutiae features [
1,
2] and comparing them to those recorded in the police database. If a match is found, the suspect’s identity can be established. While minutiae marking conventionally undergoes automated procedures, occasions may arise necessitating manual interventions—specifically involving the removal and modification of minutiae, owing to the presence of artifacts. Beyond the fact that the manual extraction of minutiae features is tedious work and prone to errors, the main disadvantages of this technique are: (1) fingerprints that are not of sufficient value are excluded and (2) attempts are not made to extract additional information beyond comparison with the database—for example, gender, age, and other information [
3]. Fingerprints being used as a gender classification method is an exciting research frontier in forensic science. Research on this topic, first pioneered by Acree in 1999 [
4], has now spanned over 20 years. The most recent literature review encompassing this work was published in 2021 [
5]. The review concluded that fingerprint ridge density could be a reliable parameter for gender classification. It also concluded that the number of ridges present on a fingerprint per unit area varies between individuals and shows differences based on gender. Female fingerprints tend to have a higher ridge density than male’s. However, the mean density of fingerprint ridges varies between populations, as demonstrated over different datasets in various studies [
6,
7]. For instance, a study published in [
6] utilized different image processing techniques, specifically minutiae detection and feature extraction; the study demonstrated 65% accuracy in gender classification, indicating moderate success. This success was further validated by studies conducted in 2023 [
7].
Alternative approaches to gender classification based on fingerprint ridges use advanced machine learning and deep learning methods based on the whole fingerprint image. In [
8], the authors introduce an approach that leverages Fast Fourier Transform (FFT), Principal Component Analysis (PCA) features, and min–max normalization. The models are trained utilizing a Support Vector Machine (SVM) classifier and incorporate sampling techniques such as SOMAT to address dataset imbalances. In this study, the right ring finger emerged as the most suitable point of interest for gender identification. It achieved an accuracy rate of 75% and 91% for males and females, respectively. The study refrained from employing deep learning methodologies and relied on a single dataset. Another article [
9] presented 99% accuracy on the NIST-DB4; however, the article did not provide references or specifics regarding the precision of the train-test division when considering the occurrence of two impressions for each fingerprint and the gender balance, resulting in uncertainty regarding those results. In contrast, when the second impression of each finger was considered [
10], the gender classification score was significantly lower—65%.
One of the most common publicly available databases for gender classification is SOCOfing. In [
11], gender classification performance was evaluated using single finger-based classification, resulting in 77% average accuracy. Remarkably, when employing a weighted approach that considered multiple fingers simultaneously, accuracy significantly improved, achieving a 90% accuracy rate. However, gender classification based on multiple fingerprints is theoretical. According to our knowledge, it has no practical applications in crime scene investigation. This is primarily due to the rarity of encountering more than one fingerprint at a crime scene, as supported by prior research [
12]. In this study, the authors employed only an elementary 5-layer CNN [
13] architecture. Moreover, it is noteworthy that the network was trained “from scratch”, without the use of any beneficial fine-tuning techniques. Another study, conducted on the SOCOfing dataset [
14,
15], reported an accuracy rate of 90% on a restricted subset comprising 2000 fingerprints out of the total 6000 fingerprints available in the SOCOfing database. Notably, there is limited clarity regarding the specific criteria employed to select the 2000 fingerprints from the larger pool of 6000. A similar phenome was also reported in [
7]. In summary, research articles focusing on the SOCOfing database have reported gender classification outcomes of approximately 75% when employing neural network-based approaches. When multiple fingerprints are used (though not practically applicable), accuracy rates of as high as 90% have been observed. Hence, it is evident that a baseline accuracy of around 75% is commonly encountered in gender classification tasks conducted on the SOCOfing dataset. This is indicated by references such as [
16] and others.
In the context of studies utilizing advanced techniques on private fingerprint datasets, a recent comprehensive publication [
17] conducted an extensive comparative analysis. This analysis encompassed nine widely employed classifiers including CNN, Support Vector Machine (SVM) with three distinct kernels, k-Nearest Neighbors (kNN), Adaboost, J48, ID3, and Linear Discriminant Analysis (LDA). Their methodology involved gender classification based on various fingerprint features, among them ridge-density, ridges, minutiae, and fingertip-size (FTS). The results revealed that CNN achieved the highest success rate for gender classification in this context. The results showed an accuracy of 95%. However, it is important to note that these results were based on private, high-quality fingerprint images collected in controlled laboratory settings (e.g., 500 dpi resolution with each finger scanned three times to ensure image quality), which may not be representative of real-world crime scene scenarios. Moreover, a somewhat redundant methodology that uses CNN for classification was utilized, when it is essentially a method for extracting features.
The Data-Centric Artificial Intelligence (DCAI) approach shifts the AI paradigm by placing more emphasis on data rather than algorithms or models [
18,
19]. Instead of solely concentrating on the model or algorithm itself, this approach analyzes data to identify which parameter instances have the most significant impact on classification. It then re-trains models based on these influential instances. In recent years, this method has been proven to offer distinct advantages over traditional approaches. DCAI represents an alternative paradigm in artificial intelligence that prioritizes data over algorithms. By implementing a set of margin-based criteria, we effectively filter out uncertain classification, and as mentioned, enable the model to learn from more reliable and informative examples. This approach has demonstrated the efficacy of the margin-of-confidence clean method in improving the performance and generalization capabilities of deep learning models.
To conclude, this study’s primary contributions encompass several key aspects:
Cross-Dataset Gender Classification Evaluation: For the first time, this research compares gender classification performance across multiple datasets, including three publicly available databases and one proprietary internal database. This cross-dataset evaluation is critical for establishing the robustness and generalizability of the proposed CNN methodology, as it mitigates the biases and limitations inherent to single-dataset studies.
Enhanced Analysis of Low-Quality and Partial Fingerprints: Recognizing the practical challenges in fingerprint investigations, this study specifically targets the classification of gender from fingerprints that are partial or of low quality. In the process of identifying and delineating the ROI, this study enhances the ability to classify gender from fingerprints that would otherwise be deemed of little value, thus significantly improving the practical application of fingerprint analysis in forensic contexts.
Application of Data-Centric AI (DCAI) for Performance Improvement: This research pioneers the application of Data-Centric AI approaches within the context of fingerprint-based gender classification. By focusing on the data itself, rather than solely on the model or algorithm, the study leverages DCAI to identify and re-train on the most impactful instances, thereby enhancing classification accuracy. This data-centric approach represents a paradigm shift in artificial intelligence applications.
2. Datasets
In this paper we evaluate four datasets, as detailed in
Table 1. Three of these datasets are publicly accessible, namely: (1) NIST-DB4 [
20], (2) SOCOfing [
21], and (3) NIST-302 [
22]. Additionally, we incorporated a privately obtained dataset from the Israeli police, henceforth referred to as IsrPoliceDB. To the best of our knowledge, these datasets represent the only publicly available resources that include gender information in conjunction with fingerprint images. The NIST-DB4 dataset [
20] comprises 4000 plain fingerprints, captured at a resolution of 500 dpi, sourced from 2000 individuals (380 females and 1620 males), where each subject contributed two impressions of their index finger for this dataset. The SOCOfing database [
21], established in 2007, consists of 6000 fingerprints collected from 600 African subjects (123 females and 477 males). Each participant provided a single impression of all ten fingers. The images in this dataset measure 96 × 103 pixels and have a resolution of 500 dpi. Furthermore, a relatively recent dataset, NIST-302 [
23], published by the NIST agency in 2019, includes fingerprint impressions from 200 Americans (132 females and 68 males). Each individual contributed 20 impressions of all ten fingers. The images in this dataset are at a resolution of 500 dpi, with dimensions of 256 × 360 pixels.
3. Methods
Section 3 is divided into three key parts: The first part discusses the CNN techniques examined, specifically VGG and ResNet. The second part centers on delineating the used cases for evaluating CNN’s performance in handling partially or low-quality fingerprint images. In the third part, we discuss the DCAI approaches implemented to enhance our findings’ performance.
The input of our method is a single fingerprint image, and the output a gender classification. This study employed supervised learning and CNN methods for model training—in particular, networks such as VGG [
24] and ResNet [
25], which have maintained their significance within the realm of deep learning for image classification. The architectures emphasize depth through the recurrent use of small receptive field convolutions and pooling layers, simplifying implementation and comprehension. ResNet [
25] introduces shortcut connections known as residual blocks; however, VGG’s [
24] adaptability remains a significant advantage. By incorporating pre-trained weights into both models, they can both be seamlessly integrated into diverse networks and deep learning tasks, emphasizing their versatility. VGG and ResNet should be carefully considered in the context of smaller datasets. The less complex architecture and shallower depth of pre-trained VGG models may be advantageous in reducing overfitting, which is particularly beneficial for smaller databases. In contrast, ResNet’s depth enables it to learn intricate patterns. VGG and ResNet’s selection for smaller datasets hinges on factors such as computational resources, data complexity, and project-specific requirements. In the broader context of image classification, the choice between VGG and ResNet is emblematic of the ongoing evolution of deep learning, with each framework offering distinct advantages based on dataset characteristics and computational constraints.
Our approach involves fine-tuning CNN architectures tailored to the specific image classification objectives. To train our CNN models, we employed standard training techniques [
13], including data augmentation, normalization, a method for class imbalance [
26], stochastic AdaGrad optimization, and others. With a learning rate of 0.0001 and a batch size of 32, the models were trained using the default parameters. The process of choosing the specific CNN architecture (i.e., VGG or ResNet) was selected based on experimentation and validation, ensuring optimal performance in classifying gender fingerprint images. Given the imbalanced dataset, evaluating the predictive model’s performance posed challenges, as traditional accuracy measures may have been misleading as they can mirror the inherent class distribution. Therefore, we opted for the F-score as our evaluation metric, which integrates precision and recall, offering a more robust performance measure for imbalanced datasets. The F-score provides a comprehensive perspective on accuracy, surpassing individual metrics such as precision or recall and enabling clearer understanding and communication of findings, irrespective of dataset class imbalances.
Due to the partial or low-quality fingerprint images commonly found at crime scenes, we explored the VGG network’s performance in classifying gender based on fingerprints’ internal and external cylindrical regions. By examining the specific contributions of the internal and external areas (see
Figure 1), we aimed to gain a deeper understanding of the nature of the fingerprint data and its relevance to gender classification tasks in cases of partially or low-quality fingerprints. We evaluated four scenarios: gender classification based on (a) 50% of the inner regions, (b) 60% of the inner regions, (c) 50% of the outer regions, and (d) 40% of the outer regions. The training and testing sets were kept the same in each case.
The methodology employed to extract both the inner and outer area of a fingerprint follows a series of steps: Initially, a binary threshold transforms the grayscale fingerprint images, which is succeeded by morphological operations. These operations are designed to reduce noise and isolate the fingerprint. Subsequently, the methodology fits an ellipse around the largest contour, assumed to be the fingerprint. Using these ellipse parameters, the inner and outer regions of the fingerprint are extracted. It is important to note that this methodology is designed to handle fingerprints that are off-center or rotated, as it does not depend on a specific alignment or central positioning within the image. This methodology also assumes that the majority of the original fingerprint is visible within the image.
Regarding the DCAI, we employed a comprehensive set of five distinct approaches to identify and rectify issues within the most challenging 5% of data instances. Subsequently, we conducted model retraining using the refined dataset and evaluated the performance of each newly trained model. The five specific DCAI approaches we examined included [
18]: Cleanlab Out of Distribution (cleanlab-OOD), FLIP (comprising Easy and Hard variants), and Margin Of Confidence (MOC; Easy and Hard variants). The following sections will provide a detailed explanation of each of these approaches.
Out of Distribution (OOD): OOD refers to Identifying outliers in test data, such as data samples that do not stem from the distribution of the training data. To evaluate OOD, we utilized the Cleanlab framework [
18], which leverages the “Principle of Counting”. This principle uses the model’s predicted probabilities (the confident joint) to estimate the number of examples in each class. By applying the Principle of Counting to these probabilities, we identified examples that fell outside the expected distribution, designating them as clearnlab Out of Distribution (cleanlab-OOD) instances.
Margin Of Confidence (MOC): MOC is a critical metric for assessing prediction accuracy. It quantifies the margin between two prediction scores. A larger MOC signifies more accurate predictions, while smaller margins imply potentially less reliable and inaccurate predictions. To measure MOC, we employed a classification head with two outputs for gender prediction (male and female). This setup allowed us to assess the accuracy of our model’s gender predictions based on the given datasets. We distinguished between MOC-Hard and MOC-Easy approaches. The MOC-Hard approach involved removing the top 5% of data instances where the model demonstrated elevated prediction confidence. This strategy allowed us to focus on more challenging data for model optimization. In contrast, the MOC-Easy approach targeted potential noise in the dataset by removing the 5% of data with the lowest prediction confidence scores, enhancing our focus on informative data.
FLIP (Classification Prediction Flips): FLIP measures the number of times a classification prediction changes during training. This metric assesses the robustness and generalizability of models. It provides insights into the effort required to achieve an acceptable level of accuracy with unseen data. It also helps identify points at which further optimization efforts may not yield improved performance. It also helps identify points where such efforts should be halted due to diminishing model accuracy returns. Similar to MOC, we distinguished between FLIP-Hard and FLIP-Easy approaches. The FLIP-Hard approach involved removing 5% of data with consistent prediction scores, retaining more complex data elements. On the other hand, the FLIP-Easy technique discarded 5% of the most inconsistent prediction data, ensuring a focus on data where the model exhibited persistent prediction clarity.
5. Conclusions
This paper demonstrates a comprehensive evaluation of fingerprint image gender classification using CNN. The study is divided into three main parts: selection of an optimal CNN for gender classification based on the given datasets, identification of critical fingerprint regions for classification, and exploration of DCAI strategies to enhance classification accuracy. This section delves into the implications of these findings.
The results indicate that VGG19 outperforms other architectures in terms of accuracy, precision, recall, and F-score. It achieved a test accuracy of 0.84, highlighting its ability to balance model complexity and generalization. These findings emphasize VGG19’s superiority in accurately classifying gender based on fingerprint images. VGG19’s successful application is further demonstrated when it is tested across different datasets, including the IsrPoliceDB and three public databases (
Table 2). The classification results range from 70% to 95%, showing its ability to generalize across various datasets without overfitting. The results correlate with the size and quality of the images in the dataset. This indicates that VGG19 effectively captures positive samples while minimizing false positives. In order to apply gender classification in a practical manner in a variety of real-world scenarios, this ability to generalize across diverse datasets is vital.
The second part of the study focused on identifying and assessing the critical fingerprint region that significantly influence gender classification (
Figure 2). It became evident that the outer region holds greater importance in gender classification. This conclusion is in harmony with the existing literature and underpinned by the distinct absence of disruptive elements within the outer region [
27]. The methodology employed in this study to define the ROI in fingerprint images yields valuable insights for researchers and practitioners dealing with partial or low-quality fingerprint images.
The third part of the study explored DCAI strategies aimed at further enhancing classification accuracy (
Figure 3). Among the five DCAI approaches examined, cleanlab-OOD and FLIP-Easy consistently demonstrated improvement in F-scores, with an average increase of 2.5% and 2.75%, respectively. FLIP-Easy appeared to be the optimal approach for several reasons: First, by eliminating 5% of the data with the most inconsistent classifications during training, it ensured that the model focused primarily on data where the classifications were stable; this enhanced the model’s generalization. By applying this strategy, overfitting was reduced, model reliability was improved, and independent data could be predicted more accurately. Second, this helped in building a leaner, more resource-efficient model. Data points with fluctuating classifications can slow down the learning process by causing the model to repeatedly adjust its parameters to correct perceived errors; by removing these unstable elements, the model can train faster and use computational resources more efficiently. Lastly, the FLIP-Easy approach encourages a focus on high-quality data. To conclude, FLIP-Easy provides the highest performance for constructing a more generalizable, efficient, and high-performance model.
Additional implications for various security and forensic applications—Fingerprints searched against the database do not always yield a match with the perpetrator, especially when the fingerprints are very low quality, and may not be sufficient for direct identification. However, in such cases, gender classification can still play a role in the exclusion or inclusion of suspects and serve as an important investigative lead, focusing the investigator on males or females. Additionally, in cases with limited forensic resources, where not all evidence can be thoroughly analyzed immediately, fingerprints can aid in prioritizing investigative efforts. Therefore, focusing on males or females based on fingerprint classification can significantly streamline the investigative process [
27].
While this study presents significant findings, it also recognizes its limitations. The performance of VGG19 might be improved by comparing it with other state-of-the-art architectures for a more comprehensive understanding. Additional research can include a broader analysis of real crime scene fingerprint images, beyond the ink fingerprints used in this study. Additionally, the employed methodology can accommodate off-center and rotated fingerprints, but it does not consider scenarios where only a part of the fingerprint is visible. While these cases were intentionally omitted from the study, as they were generated during the augmentation process for training the network, this represents a direction for future research to investigate cases of 50%, “half a ring”, partial fingerprints. These offer a direction for further investigation and future research. Overall, the study findings provide a robust foundation for the ongoing exploration of gender classification using fingerprint images.