A Light-Weight Autoregressive CNN-Based Frame Level Transducer Decoder for End-to-End ASR

Hyeon-Kyu Noh and Hong-June Park

Resumen

A convolutional neural network (CNN) transducer decoder was proposed to reduce the decoding time of an end-to-end automatic speech recognition (ASR) system while maintaining accuracy. The CNN of 177 k parameters and a kernel size of 6 generates the probabilities of the current token at the token level, at the token transition of the output token sequence. Two probabilities of the current token, one from the encoder and the other from the CNN are added to the frame level to reduce the decoding step to the number of input frames. An encoder composed of an 18-layer conformer was combined with the proposed decoder for training with the Librispeech data set. The forward-backward algorithm was used for training. The space and re-appearance tokens are added to the 300-word piece tokens to represent the token string. A space token appears at a frame between two words. A comparison with the autoregressive decoders such as transformer and RNN-T decoders demonstrates that this work provides comparable WERs with much less decoding time. A comparison with non-autoregressive decoders such as CTC indicates that this work enhanced WERs.

Palabras claves

speech recognition - autoregressive speech recognition - end-to-end - CNN - transducer decoder

Acceso

P�GINAS

pp. 0 - 0

N�MERO

Volumen: 14 Parte: 3 (2024)

MATERIAS

INGENIER�A Y CONSTRUCCI�N CIVIL
TECNOLOG�A

REVISTAS SIMILARES

Applied Sciences
Information
Aerospace

DOI

https://doi.org/10.3390/app14031300

Art�culos similares

An Application of Pixel Interval Down-Sampling (PID) for Dense Tiny Microorganism Counting on Environmental Microorganism Images

Acceso

Jiawei Zhang, Xin Zhao, Tao Jiang, Md Mamunur Rahaman, Yudong Yao, Yu-Hao Lin, Jinghua Zhang, Ao Pan, Marcin Grzegorzek and Chen Li

This paper proposes a novel pixel interval down-sampling network (PID-Net) for dense tiny object (yeast cells) counting tasks with higher accuracy. The PID-Net is an end-to-end convolutional neural network (CNN) model with an encoder?decoder architecture... ver m�s

Revista: Applied Sciences

Adaptive Feature Pyramid Network to Predict Crisp Boundaries via NMS Layer and ODS F-Measure Loss Function

Acceso

Gang Sun, Hancheng Yu, Xiangtao Jiang and Mingkui Feng

Edge detection is one of the fundamental computer vision tasks. Recent methods for edge detection based on a convolutional neural network (CNN) typically employ the weighted cross-entropy loss. Their predicted results being thick and needing post-process... ver m�s

Revista: Information

Speech GAU: A Single Head Attention for Mandarin Speech Recognition for Air Traffic Control

Acceso

Shiyu Zhang, Jianguo Kong, Chao Chen, Yabin Li and Haijun Liang

The rise of end-to-end (E2E) speech recognition technology in recent years has overturned the design pattern of cascading multiple subtasks in classical speech recognition and achieved direct mapping of speech input signals to text labels. In this study,... ver m�s

Revista: Aerospace

Deep Muti-Modal Generic Representation Auxiliary Learning Networks for End-to-End Radar Emitter Classification

Acceso

Zhigang Zhu, Zhijian Yi, Shiyao Li and Lin Li

Radar data mining is the key module for signal analysis, where patterns hidden inside of signals are gradually available in the learning process and its superiority is significant for enhancing the security of the radar emitter classification (REC) syste... ver m�s

Revista: Aerospace

STDecoder-CD: How to Decode the Hierarchical Transformer in Change Detection Tasks

Acceso

Bo Zhao, Xiaoyan Luo, Panpan Tang, Yang Liu, Haoming Wan and Ninglei Ouyang

Change detection (CD) is in demand in satellite imagery processing. Inspired by the recent success of the combined transformer-CNN (convolutional neural network) model, TransCNN, originally designed for image recognition, in this paper, we present STDeco... ver m�s

Revista: Applied Sciences

Revistas destacadas

Acceso directo a los n�meros publicados en la revista Infrastructures

Infrastructures

Acceso directo a los n�meros publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los n�meros publicados en la revista BiT

Acceso directo a los n�meros publicados en la revista Revista de la Construcci�n

Revista de la Construcci�n

Ver todas las revistas disponibles