Inicio  /  Information  /  Vol: 15 Par: 4 (2024)  /  Artículo
ARTÍCULO
TITULO

Whisper40: A Multi-Person Chinese Whisper Speaker Recognition Dataset Containing Same-Text Neutral Speech

Jingwen Yang and Ruohua Zhou    

Resumen

Whisper speaker recognition (WSR) has received extensive attention from researchers in recent years, and it plays an important role in medical, judicial, and other fields. Among them, the establishment of a whisper dataset is very important for the study of WSR. However, the existing whisper dataset suffers from the problems of a small number of speakers, short speech duration, and lack of neutral speech with the same-text as the whispered speech in the same dataset. To address this issue, we present Whisper40, a multi-person Chinese WSR dataset containing same-text neutral speech spanning around 655.90 min sourced from volunteers. In addition, we use the current state-of-the-art speaker recognition model to build a WSR baseline system and combine the idea of transfer learning for pre-training the speaker recognition model using neutral speech datasets and transfer the empirical knowledge of specific network layers to the WSR system. The Whisper40 and CHAINs datasets are then used to fine-tune the model with transferred specific layers. The experimental results show that the Whisper40 dataset is practical, and the time delay neural network (TDNN) model performs well in both the same/cross-scene experiments. The equal error rate (EER) of Chinese WSR after transfer learning is reduced by 27.62% in comparison.

 Artículos similares

       
 
Shashidhar Rudregowda, Sudarshan Patil Kulkarni, Gururaj H L, Vinayakumar Ravi and Moez Krichen    
Visual speech recognition (VSR) is a method of reading speech by noticing the lip actions of the narrators. Visual speech significantly depends on the visual features derived from the image sequences. Visual speech recognition is a stimulating process th... ver más
Revista: Acoustics

 
Mikel Penagarikano, Amparo Varona, Germán Bordel and Luis Javier Rodriguez-Fuentes    
In this paper, a semisupervised speech data extraction method is presented and applied to create a new dataset designed for the development of fully bilingual Automatic Speech Recognition (ASR) systems for Basque and Spanish. The dataset is drawn from an... ver más
Revista: Applied Sciences

 
Ismail Shahin, Ali Bou Nassif, Rameena Thomas and Shibani Hamsa    
Modern developments in machine learning methodology have produced effective approaches to speech emotion recognition. The field of data mining is widely employed in numerous situations where it is possible to predict future outcomes by using the input se... ver más
Revista: Information

 
Driss Khalil, Amrutha Prasad, Petr Motlicek, Juan Zuluaga-Gomez, Iuliia Nigmatulina, Srikanth Madikeri and Christof Schuepbach    
In air traffic management (ATM), voice communications are critical for ensuring the safe and efficient operation of aircraft. The pertinent voice communications?air traffic controller (ATCo) and pilot?are usually transmitted in a single channel, which po... ver más
Revista: Aerospace

 
Jiachen Zhang, Guoqing Tu, Shubo Liu and Zhaohui Cai    
The rapid development of speech synthesis technology has significantly improved the naturalness and human-likeness of synthetic speech. As the technical barriers for speech synthesis are rapidly lowering, the number of illegal activities such as fraud an... ver más
Revista: Algorithms