ARTÍCULO
TITULO

Multiple Speech Source Separation Using Inter-Channel Correlation and Relaxed Sparsity

Maoshen Jia    
Jundai Sun and Xiguang Zheng    

Resumen

In this work, a multiple speech source separation method using inter-channel correlation and relaxed sparsity is proposed. A B-format microphone with four spatially located channels is adopted due to the size of the microphone array to preserve the spatial parameter integrity of the original signal. Specifically, we firstly measure the proportion of overlapped components among multiple sources and find that there exist many overlapped time-frequency (TF) components with increasing source number. Then, considering the relaxed sparsity of speech sources, we propose a dynamic threshold-based separation approach of sparse components where the threshold is determined by the inter-channel correlation among the recording signals. After conducting a statistical analysis of the number of active sources at each TF instant, a form of relaxed sparsity called the half-K assumption is proposed so that the active source number in a certain TF bin does not exceed half the total number of simultaneously occurring sources. By applying the half-K assumption, the non-sparse components are recovered by regarding the extracted sparse components as a guide, combined with vector decomposition and matrix factorization. Eventually, the final TF coefficients of each source are recovered by the synthesis of sparse and non-sparse components. The proposed method has been evaluated using up to six simultaneous speech sources under both anechoic and reverberant conditions. Both objective and subjective evaluations validated that the perceptual quality of the separated speech by the proposed approach outperforms existing blind source separation (BSS) approaches. Besides, it is robust to different speeches whilst confirming all the separated speeches with similar perceptual quality.

 Artículos similares

       
 
Xiaoping Huang, Yujian Zhou and Yajun Du    
In recent years, there has been rapid development in machine learning for solving artificial intelligence tasks in various fields, including translation, speech, and image processing. These AI tasks are often interconnected rather than independent. One s... ver más
Revista: Applied Sciences

 
Taiki Arakane and Takeshi Saitoh    
This paper studies various deep learning models for word-level lip-reading technology, one of the tasks in the supervised learning of video classification. Several public datasets have been published in the lip-reading research field. However, few studie... ver más
Revista: Algorithms

 
Miodrag D. Ku?ljevic and Vladimir V. Vujicic    
Although voiced speech signals are physical signals which are approximately harmonic and electric power signals are true harmonic, the algorithms used for harmonic analysis in electric power systems can be successfully used in speech processing, includin... ver más
Revista: Acoustics

 
Akshay Mendhakar    
Texts are composed for multiple audiences and for numerous purposes. Each form of text follows a set of guidelines and structure to serve the purpose of writing. A common way of grouping texts is into text types. Describing these text types in terms of t... ver más
Revista: Information

 
Natalia Vanetik and Elisheva Mimoun    
Toxic online content has become a major issue in recent years due to the exponential increase in the use of the internet. In France, there has been a significant increase in hate speech against migrant and Muslim communities following events such as Grea... ver más
Revista: Information