Inicio  /  Algorithms  /  Vol: 15 Par: 5 (2022)  /  Artículo
ARTÍCULO
TITULO

MKD: Mixup-Based Knowledge Distillation for Mandarin End-to-End Speech Recognition

Xing Wu    
Yifan Jin    
Jianjia Wang    
Quan Qian and Yike Guo    

Resumen

Large-scale automatic speech recognition model has achieved impressive performance. However, huge computational resources and massive amount of data are required to train an ASR model. Knowledge distillation is a prevalent model compression method which transfers the knowledge from large model to small model. To improve the efficiency of knowledge distillation for end-to-end speech recognition especially in the low-resource setting, a Mixup-based Knowledge Distillation (MKD) method is proposed which combines Mixup, a data-agnostic data augmentation method, with softmax-level knowledge distillation. A loss-level mixture is presented to address the problem caused by the non-linearity of label in the KL-divergence when adopting Mixup to the teacher?student framework. It is mathematically shown that optimizing the mixture of loss function is equivalent to optimize an upper bound of the original knowledge distillation loss. The proposed MKD takes the advantage of Mixup and brings robustness to the model even with a small amount of training data. The experiments on Aishell-1 show that MKD obtains a 15.6% and 3.3% relative improvement on two student models with different parameter scales compared with the existing methods. Experiments on data efficiency demonstrate MKD achieves similar results with only half of the original dataset.

 Artículos similares

       
 
Yuchen Dong, Heng Zhou, Chengyang Li, Junjie Xie, Yongqiang Xie and Zhongbo Li    
Camouflaged object detection (COD) is an arduous challenge due to the striking resemblance of camouflaged objects to their surroundings. The abundance of similar background information can significantly impede the efficiency of camouflaged object detecti... ver más
Revista: Applied Sciences

 
Hao Liu, Bo Yang and Zhiwen Yu    
Multimodal sarcasm detection is a developing research field in social Internet of Things, which is the foundation of artificial intelligence and human psychology research. Sarcastic comments issued on social media often imply people?s real attitudes towa... ver más
Revista: Applied Sciences

 
Songpu Li, Xinran Yu and Peng Chen    
Model robustness is an important index in medical cybersecurity, and hard-negative samples in electronic medical records can provide more gradient information, which can effectively improve the robustness of a model. However, hard negatives pose difficul... ver más
Revista: Applied Sciences

 
Jia-Ling Xie, Wei-Feng Shi, Ting Xue and Yu-Hang Liu    
The fault detection and diagnosis of a ship?s electric propulsion system is of great significance to the reliability and safety of large modern ships. The traditional fault diagnosis method based on mathematical models and expert knowledge is limited by ... ver más

 
Laura Moretti, Leonardo Palozza and Antonio D?Andrea    
No theoretical model effectively explains the blistering process, which provokes functional distress in asphalt pavements worldwide. This study focuses on the possible causes of blistering, the physical processes that drive blistering, the role of asphal... ver más
Revista: Applied Sciences