Inicio  /  Applied Sciences  /  Vol: 13 Par: 9 (2023)  /  Artículo
ARTÍCULO
TITULO

Semi-Supervised Learning for Robust Emotional Speech Synthesis with Limited Data

Jialin Zhang    
Mairidan Wushouer    
Gulanbaier Tuerhong and Hanfang Wang    

Resumen

Emotional speech synthesis is an important branch of human?computer interaction technology that aims to generate emotionally expressive and comprehensible speech based on the input text. With the rapid development of speech synthesis technology based on deep learning, the research of affective speech synthesis has gradually attracted the attention of scholars. However, due to the lack of quality emotional speech synthesis corpus, emotional speech synthesis research under low-resource conditions is prone to overfitting, exposure error, catastrophic forgetting and other problems leading to unsatisfactory generated speech results. In this paper, we proposed an emotional speech synthesis method that integrates migration learning, semi-supervised training and robust attention mechanism to achieve better adaptation to the emotional style of the speech data during fine-tuning. By adopting an appropriate fine-tuning strategy, trade-off parameter configuration and pseudo-labels in the form of loss functions, we efficiently guided the learning of the regularized synthesis of emotional speech. The proposed SMAL-ET2 method outperforms the baseline methods in both subjective and objective evaluations. It is demonstrated that our training strategy with stepwise monotonic attention and semi-supervised loss method can alleviate the overfitting phenomenon and improve the generalization ability of the text-to-speech model. Our method can also enable the model to successfully synthesize different categories of emotional speech with better naturalness and emotion similarity.

 Artículos similares

       
 
Yang Zhang, Yuan Feng, Shiqi Wang, Zhicheng Tang, Zhenduo Zhai, Reid Viegut, Lisa Webb, Andrew Raedeke and Yi Shang    
Waterfowl populations monitoring is essential for wetland conservation. Lately, deep learning techniques have shown promising advancements in detecting waterfowl in aerial images. In this paper, we present performance evaluation of several popular superv... ver más
Revista: Information

 
MohammadHossein Reshadi, Wen Li, Wenjie Xu, Precious Omashor, Albert Dinh, Scott Dick, Yuntong She and Michael Lipsett    
Anomaly detection in data streams (and particularly time series) is today a vitally important task. Machine learning algorithms are a common design for achieving this goal. In particular, deep learning has, in the last decade, proven to be substantially ... ver más
Revista: Algorithms

 
Jie Zhang, Fan Li, Xin Zhang, Yue Cheng and Xinhong Hei    
As a crucial task for disease diagnosis, existing semi-supervised segmentation approaches process labeled and unlabeled data separately, ignoring the relationships between them, thereby limiting further performance improvements. In this work, we introduc... ver más
Revista: Applied Sciences

 
Kui Zeng, Shutan Xu, Daode Shu and Ming Chen    
Medaka (Oryzias latipes), as a crucial model organism in biomedical research, holds significant importance in fields such as cardiovascular diseases. Currently, the analysis of the medaka ventricle relies primarily on visual observation under a microscop... ver más
Revista: Applied Sciences

 
Xuefeng Zhang, Youngsung Kim, Young-Chul Chung, Sangcheol Yoon, Sang-Yong Rhee and Yong Soo Kim    
Large-scale datasets, which have sufficient and identical quantities of data in each class, are the main factor in the success of deep-learning-based classification models for vision tasks. A shortage of sufficient data and interclass imbalanced data dis... ver más
Revista: Applied Sciences