Resumen
This research paper focuses on developing an effective gesture-to-text translation system using state-of-the-art computer vision techniques. The existing research on sign language translation has yet to utilize skin masking, edge detection, and feature extraction techniques to their full potential. Therefore, this study employs the speeded-up robust features (SURF) model for feature extraction, which is resistant to variations such as rotation, perspective scaling, and occlusion. The proposed system utilizes a bag of visual words (BoVW) model for gesture-to-text conversion. The study uses a dataset of 42,000 photographs consisting of alphabets (A?Z) and numbers (1?9), divided into 35 classes with 1200 shots per class. The pre-processing phase includes skin masking, where the RGB color space is converted to the HSV color space, and Canny edge detection is used for sharp edge detection. The SURF elements are grouped and converted to a visual language using the K-means mini-batch clustering technique. The proposed system?s performance is evaluated using several machine learning algorithms such as naïve Bayes, logistic regression, K nearest neighbors, support vector machine, and convolutional neural network. All the algorithms benefited from SURF, and the system?s accuracy is promising, ranging from 79% to 92%. This research study not only presents the development of an effective gesture-to-text translation system but also highlights the importance of using skin masking, edge detection, and feature extraction techniques to their full potential in sign language translation. The proposed system aims to bridge the communication gap between individuals who cannot speak and those who cannot understand Indian Sign Language (ISL).