Redirigiendo al acceso original de articulo en 15 segundos...
Inicio  /  Applied Sciences  /  Vol: 13 Par: 1 (2023)  /  Artículo
ARTÍCULO
TITULO

Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition

Wondimu Lambamo    
Ramasamy Srinivasagan and Worku Jifara    

Resumen

The performance of speaker recognition systems is very well on the datasets without noise and mismatch. However, the performance gets degraded with the environmental noises, channel variation, physical and behavioral changes in speaker. The types of Speaker related feature play crucial role in improving the performance of speaker recognition systems. Gammatone Frequency Cepstral Coefficient (GFCC) features has been widely used to develop robust speaker recognition systems with the conventional machine learning, it achieved better performance compared to Mel Frequency Cepstral Coefficient (MFCC) features in the noisy condition. Recently, deep learning models showed better performance in the speaker recognition compared to conventional machine learning. Most of the previous deep learning-based speaker recognition models has used Mel Spectrogram and similar inputs rather than a handcrafted features like MFCC and GFCC features. However, the performance of the Mel Spectrogram features gets degraded in the high noise ratio and mismatch in the utterances. Similar to Mel Spectrogram, Cochleogram is another important feature for deep learning speaker recognition models. Like GFCC features, Cochleogram represents utterances in Equal Rectangular Band (ERB) scale which is important in noisy condition. However, none of the studies have conducted analysis for noise robustness of Cochleogram and Mel Spectrogram in speaker recognition. In addition, only limited studies have used Cochleogram to develop speech-based models in noisy and mismatch condition using deep learning. In this study, analysis of noise robustness of Cochleogram and Mel Spectrogram features in speaker recognition using deep learning model is conducted at the Signal to Noise Ratio (SNR) level from -5 dB to 20 dB. Experiments are conducted on the VoxCeleb1 and Noise added VoxCeleb1 dataset by using basic 2DCNN, ResNet-50, VGG-16, ECAPA-TDNN and TitaNet Models architectures. The Speaker identification and verification performance of both Cochleogram and Mel Spectrogram is evaluated. The results show that Cochleogram have better performance than Mel Spectrogram in both speaker identification and verification at the noisy and mismatch condition.

 Artículos similares

       
 
Yu Lu, Jianping Yuan, Qiaorui Si, Peifeng Ji, Ding Tian and Jinfeng Liu    
In previous AUV designs, the thrusters were often placed outside the vehicle, resulting in their performance being significantly influenced by the shape of the vehicle. Additionally, this placement also leads to the generation of strong radiated noise th... ver más

 
Tao Zhang, Yibo Ai and Weidong Zhang    
The mechanical simulation experiment can provide guidelines for the structural design of materials, but the module partition of mechanical simulation experiments is still in its infancy. A mechanical simulation contour, e.g., strain and stress contour, h... ver más
Revista: Applied Sciences

 
Jorge Hewstone and Roberto Araya    
Audio recording in classrooms is a common practice in educational research, with applications ranging from detecting classroom activities to analyzing student behavior. Previous research has employed neural networks for classroom activity detection and s... ver más
Revista: Applied Sciences

 
Cinthia Peraza, Patricia Ochoa, Oscar Castillo and Patricia Melin    
The challenges we face in today?s world are increasingly complex, and effectively managing uncertainty when modeling control problems can yield significant benefits. However, the complexity of these models often leads to higher computational costs. There... ver más
Revista: Applied Sciences

 
Zbigniew Laszczych, Mikolaj Krakowski and Grzegorz Sobon    
We report a study on pulse dynamics in figure-nine Tm-doped all-polarization maintaining fiber laser. We analyzed laser operation from self-starting with multi-pulse dynamic to single-pulse operation by decreasing the pump power from the mode-locking thr... ver más
Revista: Applied Sciences