Whisper40: A Multi-Person Chinese Whisper Speaker Recognition Dataset Containing Same-Text Neutral Speech

Jingwen Yang and Ruohua Zhou

Resumen

Whisper speaker recognition (WSR) has received extensive attention from researchers in recent years, and it plays an important role in medical, judicial, and other fields. Among them, the establishment of a whisper dataset is very important for the study of WSR. However, the existing whisper dataset suffers from the problems of a small number of speakers, short speech duration, and lack of neutral speech with the same-text as the whispered speech in the same dataset. To address this issue, we present Whisper40, a multi-person Chinese WSR dataset containing same-text neutral speech spanning around 655.90 min sourced from volunteers. In addition, we use the current state-of-the-art speaker recognition model to build a WSR baseline system and combine the idea of transfer learning for pre-training the speaker recognition model using neutral speech datasets and transfer the empirical knowledge of specific network layers to the WSR system. The Whisper40 and CHAINs datasets are then used to fine-tune the model with transferred specific layers. The experimental results show that the Whisper40 dataset is practical, and the time delay neural network (TDNN) model performs well in both the same/cross-scene experiments. The equal error rate (EER) of Chinese WSR after transfer learning is reduced by 27.62% in comparison.

Palabras claves

speaker recognition - deep learning - whisper dataset - transfer learning - audio reverse

Acceso

P�GINAS

pp. 0 - 0

N�MERO

Volumen: 15 Parte: 4 (2024)

MATERIAS

INGENIER�A Y CONSTRUCCI�N CIVIL
TECNOLOG�A

REVISTAS SIMILARES

Algorithms
Information
Aerospace

DOI

https://doi.org/10.3390/info15040184

Art�culos similares

Visual Speech Recognition for Kannada Language Using VGG16 Convolutional Neural Network

Acceso

Shashidhar Rudregowda, Sudarshan Patil Kulkarni, Gururaj H L, Vinayakumar Ravi and Moez Krichen

Visual speech recognition (VSR) is a method of reading speech by noticing the lip actions of the narrators. Visual speech significantly depends on the visual features derived from the image sequences. Visual speech recognition is a stimulating process th... ver m�s

Revista: Acoustics

Semisupervised Speech Data Extraction from Basque Parliament Sessions and Validation on Fully Bilingual Basque?Spanish ASR

Acceso

Mikel Penagarikano, Amparo Varona, Germ�n Bordel and Luis Javier Rodriguez-Fuentes

In this paper, a semisupervised speech data extraction method is presented and applied to create a new dataset designed for the development of fully bilingual Automatic Speech Recognition (ASR) systems for Basque and Spanish. The dataset is drawn from an... ver m�s

Revista: Applied Sciences

Novel Task-Based Unification and Adaptation (TUA) Transfer Learning Approach for Bilingual Emotional Speech Data

Acceso

Ismail Shahin, Ali Bou Nassif, Rameena Thomas and Shibani Hamsa

Modern developments in machine learning methodology have produced effective approaches to speech emotion recognition. The field of data mining is widely employed in numerous situations where it is possible to predict future outcomes by using the input se... ver m�s

Revista: Information

An Automatic Speaker Clustering Pipeline for the Air Traffic Communication Domain

Acceso

Driss Khalil, Amrutha Prasad, Petr Motlicek, Juan Zuluaga-Gomez, Iuliia Nigmatulina, Srikanth Madikeri and Christof Schuepbach

In air traffic management (ATM), voice communications are critical for ensuring the safe and efficient operation of aircraft. The pertinent voice communications?air traffic controller (ATCo) and pilot?are usually transmitted in a single channel, which po... ver m�s

Revista: Aerospace

Audio Anti-Spoofing Based on Audio Feature Fusion

Acceso

Jiachen Zhang, Guoqing Tu, Shubo Liu and Zhaohui Cai

The rapid development of speech synthesis technology has significantly improved the naturalness and human-likeness of synthetic speech. As the technical barriers for speech synthesis are rapidly lowering, the number of illegal activities such as fraud an... ver m�s

Revista: Algorithms

Revistas destacadas

Acceso directo a los n�meros publicados en la revista Infrastructures

Infrastructures

Acceso directo a los n�meros publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los n�meros publicados en la revista BiT

Acceso directo a los n�meros publicados en la revista Revista de la Construcci�n

Revista de la Construcci�n

Ver todas las revistas disponibles