Redirigiendo al acceso original de articulo en 22 segundos...
Inicio  /  Applied Sciences  /  Vol: 10 Par: 6 (2020)  /  Artículo
ARTÍCULO
TITULO

Acoustic Data-Driven Subword Units Obtained through Segment Embedding and Clustering for Spontaneous Speech Recognition

Jeong-Uk Bang    
Sang-Hun Kim and Oh-Wook Kwon    

Resumen

We propose a method to extend a phoneme set by using a large amount of broadcast data to improve the performance of Korean spontaneous speech recognition. In the proposed method, we first extract variable-length phoneme-level segments from broadcast data and then convert them into fixed-length embedding vectors based on a long short-term memory architecture. We use decision tree-based clustering to find acoustically similar embedding vectors and then build new acoustic subword units by gathering the clustered vectors. To update the lexicon of a speech recognizer, we build a lookup table between the tri-phone units and the units derived from the decision tree. Finally, the proposed lexicon is obtained by updating the original phoneme-based lexicon by referencing the lookup table. To verify the performance of the proposed unit, we compare the proposed unit with the previous units obtained by using the segment-based k-means clustering method or the frame-based decision-tree clustering method. As a result, the proposed unit is shown to produce better performance than the previous units in both spontaneous, and read Korean speech recognition tasks.

 Artículos similares