REVISTA
Big Data and Cognitive Computing

TODAS

Inicio / Big Data and Cognitive Computing / Vol: 7 Par: 3 (2023) / Art�culo

ART�CULO

TITULO

The Development of a Kazakh Speech Recognition Model Using a Convolutional Neural Network with Fixed Character Level Filters

Nurgali Kadyrbek

Madina Mansurova

Adai Shomanov and Gaukhar Makharova

Resumen

This study is devoted to the transcription of human speech in the Kazakh language in dynamically changing conditions. It discusses key aspects related to the phonetic structure of the Kazakh language, technical considerations in collecting the transcribed audio corpus, and the use of deep neural networks for speech modeling. A high-quality decoded audio corpus was collected, containing 554 h of data, giving an idea of the frequencies of letters and syllables, as well as demographic parameters such as the gender, age, and region of residence of native speakers. The corpus contains a universal vocabulary and serves as a valuable resource for the development of modules related to speech. Machine learning experiments were conducted using the DeepSpeech2 model, which includes a sequence-to-sequence architecture with an encoder, decoder, and attention mechanism. To increase the reliability of the model, filters initialized with symbol-level embeddings were introduced to reduce the dependence on accurate positioning on object maps. The training process included simultaneous preparation of convolutional filters for spectrograms and symbolic objects. The proposed approach, using a combination of supervised and unsupervised learning methods, resulted in a 66.7% reduction in the weight of the model while maintaining relative accuracy. The evaluation on the test sample showed a 7.6% lower character error rate (CER) compared to existing models, demonstrating its most modern characteristics. The proposed architecture provides deployment on platforms with limited resources. Overall, this study presents a high-quality audio corpus, an improved speech recognition model, and promising results applicable to speech-related applications and languages beyond Kazakh.

Palabras claves

automatic speech recognition - deep learning - low-resource - Kazakh - speech corpus - character embedding

Acceso

P�GINAS

pp. 0 - 0

N�MERO

Volumen: 7 Parte: 3 (2023)

MATERIAS

INFRAESTRUCTURA

REVISTAS SIMILARES

Big Data and Cognitive Computing
Future Internet
IoT

DOI

https://doi.org/10.3390/bdcc7030132

Art�culos similares

Detection of the Bedload Movement with an Acoustic Sensor in the Yangtze River, China

Acceso

Mi Tian, Shengfa Yang and Peng Zhang

The acoustic method, which enables continuous monitoring with great temporal resolution, is an alternative technique for detecting bedload movement. In order to record the sound signals produced by the impacts between gravel particles and detect the bedl... ver m�s

Revista: Water

Neural Network Exploration for Keyword Spotting on Edge Devices

Acceso

Jacob Bushur and Chao Chen

The introduction of artificial neural networks to speech recognition applications has sparked the rapid development and popularization of digital assistants. These digital assistants constantly monitor the audio captured by a microphone for a small set o... ver m�s

Revista: Future Internet

Tell Me More: Automating Emojis Classification for Better Accessibility and Emotional Context Recognition

Acceso

Muhammad Atif and Valentina Franzoni

Users of web or chat social networks typically use emojis (e.g., smilies, memes, hearts) to convey in their textual interactions the emotions underlying the context of the communication, aiming for better interpretability, especially for short polysemous... ver m�s

Revista: Future Internet

Translating Speech to Indian Sign Language Using Natural Language Processing

Acceso

Purushottam Sharma, Devesh Tulsian, Chaman Verma, Pratibha Sharma and Nancy Nancy

Language plays a vital role in the communication of ideas, thoughts, and information to others. Hearing-impaired people also understand our thoughts using a language known as sign language. Every country has a different sign language which is based on th... ver m�s

Revista: Future Internet

Multi-Angle Lipreading with Angle Classification-Based Feature Extraction and Its Application to Audio-Visual Speech Recognition

Acceso

Shinnosuke Isobe, Satoshi Tamura, Satoru Hayamizu, Yuuto Gotoh and Masaki Nose

Recently, automatic speech recognition (ASR) and visual speech recognition (VSR) have been widely researched owing to the development in deep learning. Most VSR research works focus only on frontal face images. However, assuming real scenes, it is obviou... ver m�s

Revista: Future Internet

Revistas destacadas

Acceso directo a los n�meros publicados en la revista Infrastructures

Infrastructures

Acceso directo a los n�meros publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los n�meros publicados en la revista BiT

Acceso directo a los n�meros publicados en la revista Revista de la Construcci�n

Revista de la Construcci�n

Ver todas las revistas disponibles