ARTÍCULO
TITULO

Review of existing text-to-speech algorithms

Nikita Kireev    
Eugene Ilyushin    

Resumen

Scientists have long been working on algorithms for translate text written in natural language into speech. But the quality of work these algorithms left much to be desired until the moment when the application of deep learning methods was not possible. With the advent of the necessary computing resources and the accumulation of a sufficient amount of data for training, these methods have become widely used in machine learning in general and, of course, in speech synthesis in particular. A significant improvement in the quality of the work of text-to-speech algorithms has led to their widespread use, namely in mobile devices, smart speakers, voice assistants, etc. But it is worth noting that the algorithms of this class, developed at the moment, do not always correctly cope with the task. For example, they cannot always correctly emphasize or voice the necessary parts of the text with the necessary intonation. Thus, the study of methods and means of synthesizing speech has become even more relevant.There are many different ways to synthesize speech by text, such as parametric synthesis, compilation synthesis, subject-oriented synthesis, and full speech synthesis by the rules. The purpose of this work is to review existing algorithms for translating text to speech and conducting their comparative analysis. The main algorithms were considered: WaveNet, DeepVoice, Tacatron, DeepVoice 2, DeepVoice 3 and Tacatron 2. In the course of their comparison, it was determined that the best at the moment are DeepVoice 3 and Tacatron 2, since the assessments of the quality of their work are closest to professionally recorded speech.

 Artículos similares

       
 
Lucas Schmidt Goecks, Anderson Felipe Habekost, Antonio Maria Coruzzolo and Miguel Afonso Sellitto    
Digital transformations in manufacturing systems confer advantages for enhancing competitiveness and ensuring the survival of companies by reducing operating costs, improving quality, and fostering innovation, falling within the overarching umbrella of I... ver más

 
Jianan Yin, Mingwei Zhang, Yuanyuan Ma, Wei Wu, He Li and Ping Chen    
Airport arrival and departure movements are characterized by high dynamism, stochasticity, and uncertainty. Therefore, it is of paramount importance to predict and analyze surface taxi time accurately and scientifically. This paper conducts a comprehensi... ver más
Revista: Applied Sciences

 
Fahim Sufi    
GPT (Generative Pre-trained Transformer) represents advanced language models that have significantly reshaped the academic writing landscape. These sophisticated language models offer invaluable support throughout all phases of research work, facilitatin... ver más
Revista: Information

 
Marwah Abdulrazzaq Naser, Aso Ahmed Majeed, Muntadher Alsabah, Taha Raad Al-Shaikhli and Kawa M. Kaky    
Cardiovascular disease is the leading cause of global mortality and responsible for millions of deaths annually. The mortality rate and overall consequences of cardiac disease can be reduced with early disease detection. However, conventional diagnostic ... ver más
Revista: Algorithms

 
Liushuai Cao, Yanyan Pan, Gang Gao, Linjie Li and Decheng Wan    
Wakes produced by underwater vehicles, particularly submarines, in density-stratified fluids play a pivotal role across military, academic, and engineering domains. In comparison to homogeneous fluid environments, wakes in stratified flows exhibit distin... ver más