Inicio  /  Information  /  Vol: 13 Par: 10 (2022)  /  Artículo
ARTÍCULO
TITULO

Language Identification-Based Evaluation of Single Channel Speech Separation of Overlapped Speeches

Zuhragvl Aysa    
Mijit Ablimit    
Hankiz Yilahun and Askar Hamdulla    

Resumen

In multi-lingual, multi-speaker environments (e.g., international conference scenarios), speech, language, and background sounds can overlap. In real-world scenarios, source separation techniques are needed to separate target sounds. Downstream tasks, such as ASR, speaker recognition, speech recognition, VAD, etc., can be combined with speech separation tasks to gain a better understanding. Since most of the evaluation methods for monophonic separation are either single or subjective, this paper used the downstream recognition task as an overall evaluation criterion. Thus, the performance could be directly evaluated by the metrics of the downstream task. In this paper, we investigated a two-stage training scheme that combined speech separation and language identification tasks. To analyze and optimize the separation performance of single-channel overlapping speech, the separated speech was fed to a language identification engine to evaluate its accuracy. The speech separation model was a single-channel speech separation network trained with WSJ0-2mix. For the language identification system, we used an Oriental Language Dataset and a dataset synthesized by directly mixing different proportions of speech groups. The combined effect of these two models was evaluated for various overlapping speech scenarios. When the language identification network model was based on single-person single-speech frequency spectrum features, Chinese, Japanese, Korean, Indonesian, and Vietnamese had significantly improved recognition results over the mixed audio spectrum.

 Artículos similares

       
 
Ke Zhang, Yangjie Wei, Dan Wu and Yi Wang    
Voice signals acquired by a microphone array often include considerable noise and mutual interference, seriously degrading the accuracy and speed of speech separation. Traditional beamforming is simple to implement, but its source interference suppressio... ver más
Revista: Applied Sciences

 
Waqas Rafique, Jonathon Chambers and Ali Imam Sunny    
The performance of the independent vector analysis (IVA) algorithm depends on the choice of the source prior to better model the speech signals as it employs a multivariate source prior to retain the dependency between frequency bins of each source. Iden... ver más
Revista: Acoustics

 
Maoshen Jia, Jundai Sun and Xiguang Zheng    
In this work, a multiple speech source separation method using inter-channel correlation and relaxed sparsity is proposed. A B-format microphone with four spatially located channels is adopted due to the size of the microphone array to preserve the spati... ver más
Revista: Applied Sciences

 
Gustavo Fernandes Rodrigues,Thiago de Souza Siqueira,Ana Cláudia Silva de Souza,Hani Camille Yehia     Pág. 80 - 85
In this paper we present an insight into the use of spectral masking techniques in time-frequency domain, as a preprocessing step for the speech signal recognition. Speech recognition systems have their performance negatively affected in noisy environmen... ver más

 
Pedersen, M. S.; Wang, D.; Larsen, J.; Kjems, U.     Pág. 475 - 492