REVISTA
Aerospace

TODAS

Inicio / Aerospace / Vol: 9 Par: 8 (2022) / Art�culo

ART�CULO

TITULO

Speech GAU: A Single Head Attention for Mandarin Speech Recognition for Air Traffic Control

Shiyu Zhang

Jianguo Kong

Chao Chen

Yabin Li and Haijun Liang

Resumen

The rise of end-to-end (E2E) speech recognition technology in recent years has overturned the design pattern of cascading multiple subtasks in classical speech recognition and achieved direct mapping of speech input signals to text labels. In this study, a new E2E framework, ResNet?GAU?CTC, is proposed to implement Mandarin speech recognition for air traffic control (ATC). A deep residual network (ResNet) utilizes the translation invariance and local correlation of a convolutional neural network (CNN) to extract the time-frequency domain information of speech signals. A gated attention unit (GAU) utilizes a gated single-head attention mechanism to better capture the long-range dependencies of sequences, thus attaining a larger receptive field and contextual information, as well as a faster training convergence rate. The connectionist temporal classification (CTC) criterion eliminates the need for forced frame-level alignments. To address the problems of scarce data resources and unique pronunciation norms and contexts in the ATC field, transfer learning and data augmentation techniques were applied to enhance the robustness of the network and improve the generalization ability of the model. The character error rate (CER) of our model was 11.1% on the expanded Aishell corpus, and it decreased to 8.0% on the ATC corpus.

Palabras claves

end-to-end speech recognition - ResNet-GAU-CTC - air traffic control - transfer learning - data augmentation

Acceso

P�GINAS

pp. 0 - 0

N�MERO

Volumen: 9 Parte: 8 (2022)

MATERIAS

INGENIER�A Y CONSTRUCCI�N CIVIL
TECNOLOG�A

REVISTAS SIMILARES

Applied Sciences
Information
Aerospace

DOI

https://doi.org/10.3390/aerospace9080395

Art�culos similares

Multi-Feature Fusion Method for Chinese Shipping Companies Credit Named Entity Recognition

Acceso

Lin He, Shengnan Wang and Xinran Cao

Shipping Enterprise Credit Named Entity Recognition (NER) aims to recognize shipping enterprise credit entities from unstructured shipping enterprise credit texts. Aiming at the problem of low entity recognition rate caused by complex and diverse entitie... ver m�s

Revista: Applied Sciences

A Novel Bi-Dual Inference Approach for Detecting Six-Element Emotions

Acceso

Xiaoping Huang, Yujian Zhou and Yajun Du

In recent years, there has been rapid development in machine learning for solving artificial intelligence tasks in various fields, including translation, speech, and image processing. These AI tasks are often interconnected rather than independent. One s... ver m�s

Revista: Applied Sciences

Language Identification-Based Evaluation of Single Channel Speech Separation of Overlapped Speeches

Acceso

Zuhragvl Aysa, Mijit Ablimit, Hankiz Yilahun and Askar Hamdulla

In multi-lingual, multi-speaker environments (e.g., international conference scenarios), speech, language, and background sounds can overlap. In real-world scenarios, source separation techniques are needed to separate target sounds. Downstream tasks, su... ver m�s

Revista: Information

Content-Based Video Big Data Retrieval with Extensive Features and Deep Learning

Acceso

Thuong-Cang Phan, Anh-Cang Phan, Hung-Phi Cao and Thanh-Ngoan Trieu

In the era of digital media, the rapidly increasing volume and complexity of multimedia data cause many problems in storing, processing, and querying information in a reasonable time. Feature extraction and processing time play an extremely important rol... ver m�s

Revista: Applied Sciences

Comfort Distance?A Single-Number Quantity Describing Spatial Attenuation in Open-Plan Offices

Acceso

Valtteri Hongisto and Jukka Ker�nen

ISO 3382-3 is globally used to determine the room acoustic conditions of open-plan offices using in situ measurements. The key outcomes of the standard are three single-number quantities: distraction distance, rD, A-weighted sound pressure level of speec... ver m�s

Revista: Applied Sciences

Revistas destacadas

Acceso directo a los n�meros publicados en la revista Infrastructures

Infrastructures

Acceso directo a los n�meros publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los n�meros publicados en la revista BiT

Acceso directo a los n�meros publicados en la revista Revista de la Construcci�n

Revista de la Construcci�n

Ver todas las revistas