ARTÍCULO
TITULO

Application of sinusoidal speech modeling to the sound diarization problem

Bulat Nutfullin    
Eugene Ilyushin    

Resumen

Speech is a specific feature of human and his advantage over other species within evolution. Sound diarization is a process of sound separation, taking into account belonging to the speaker. Before the advent of deep learning and the availability of the necessary computing resources, the quality of the algorithms that determine the speaker by voice left much to be desired. Diarization has numerous applications: smart speakers, mobile phones, automatic speech translation systems. But it should be noted that the existing diarization algorithms have drawbacks, for example, the complexity of work with simultaneous speech by several speakers or the lack of diarization results for its automatic application in some areas. This explains the relevance of research in this area. The sinusoidal model is an algorithm for tracking sequences of points in timeamplitudefrequency space. In existing researches, it is applied to simulations of echolocation, human speech, and speech synthesis. At the time of the study, no applications of the sinusoidal model in the problem of diarization were found in the literature. The paper considers the problem of diarization and the main quality indicators used in assessing the solutions to this problem. The main intermediate representations of sound used in existing solutions are considered, and a diarization algorithm using sinusoidal speech modeling is proposed. The advantage of the proposed algorithm is the ability to operate sinusoidal representations as VAD, which in general made it possible to make the used diarization algorithm more efficient.

PÁGINAS
pp. 14 - 20
REVISTAS SIMILARES

 Artículos similares