Inicio  /  Applied Sciences  /  Vol: 10 Par: 20 (2020)  /  Artículo
ARTÍCULO
TITULO

A Preprocessing Strategy for Denoising of Speech Data Based on Speech Segment Detection

Seung-Jun Lee and Hyuk-Yoon Kwon    

Resumen

In this paper, we propose a preprocessing strategy for denoising of speech data based on speech segment detection. A design of computationally efficient speech denoising is necessary to develop a scalable method for large-scale data sets. Furthermore, it becomes more important as the deep learning-based methods have been developed because they require significant costs while showing high performance in general. The basic idea of the proposed method is using the speech segment detection so as to exclude non-speech segments before denoising. The speech segmentation detection can exclude non-speech segments with a negligible cost, which will be removed in denoising process with a much higher cost, while maintaining the accuracy of denoising. First, we devise a framework to choose the best preprocessing method for denoising based on the speech segment detection for a target environment. For this, we speculate the environments for denoising using different levels of signal-to-noise ratio (SNR) and multiple evaluation metrics. The framework finds the best speech segment detection method tailored to a target environment according to the performance evaluation of speech segment detection methods. Next, we investigate the accuracy of the speech segment detection methods extensively. We conduct the performance evaluation of five speech segment detection methods with different levels of SNRs and evaluation metrics. Especially, we show that we can adjust the accuracy between the precision and recall of each method by controlling a parameter. Finally, we incorporate the best speech segment detection method for a target environment into a denoising process. Through extensive experiments, we show that the accuracy of the proposed scheme is comparable to or even better than that of Wavenet-based denoising, which is one of recent advanced denoising methods based on deep neural networks, in terms of multiple evaluation metrics of denoising, i.e., SNR, STOI, and PESQ, while it can reduce the denoising time of the Wavenet-based denoising by approximately 40?50% according to the used speech segment detection method.

 Artículos similares

       
 
Ke Zhang, Yangjie Wei, Dan Wu and Yi Wang    
Voice signals acquired by a microphone array often include considerable noise and mutual interference, seriously degrading the accuracy and speed of speech separation. Traditional beamforming is simple to implement, but its source interference suppressio... ver más
Revista: Applied Sciences

 
Gustavo Fernandes Rodrigues,Thiago de Souza Siqueira,Ana Cláudia Silva de Souza,Hani Camille Yehia     Pág. 80 - 85
In this paper we present an insight into the use of spectral masking techniques in time-frequency domain, as a preprocessing step for the speech signal recognition. Speech recognition systems have their performance negatively affected in noisy environmen... ver más