Efficient DNN Model for Word Lip-Reading

Taiki Arakane and Takeshi Saitoh

Resumen

This paper studies various deep learning models for word-level lip-reading technology, one of the tasks in the supervised learning of video classification. Several public datasets have been published in the lip-reading research field. However, few studies have investigated lip-reading techniques using multiple datasets. This paper evaluates deep learning models using four publicly available datasets, namely Lip Reading in the Wild (LRW), OuluVS, CUAVE, and Speech Scene by Smart Device (SSSD), which are representative datasets in this field. LRW is one of the large-scale public datasets and targets 500 English words released in 2016. Initially, the recognition accuracy of LRW was 66.1%, but many research groups have been working on it. The current the state of the art (SOTA) has achieved 94.1% by 3D-Conv + ResNet18 + {DC-TCN, MS-TCN, BGRU} + knowledge distillation + word boundary. Regarding the SOTA model, in this paper, we combine existing models such as ResNet, WideResNet, WideResNet, EfficientNet, MS-TCN, Transformer, ViT, and ViViT, and investigate the effective models for word lip-reading tasks using six deep learning models with modified feature extractors and classifiers. Through recognition experiments, we show that similar model structures of 3D-Conv + ResNet18 for feature extraction and MS-TCN model for inference are valid for four datasets with different scales.

Palabras claves

lip-reading - word recognition - deep neural network - LRW - OuluVS - CUAVE - SSSD - 3D convolutional layer - ResNet - WideResNet - EfficientNet - transformer - ViT - ViViT - MS-TCN

Acceso

P�GINAS

pp. 0 - 0

N�MERO

Volumen: 16 Parte: 6 (2023)

MATERIAS

INGENIER�A Y CONSTRUCCI�N CIVIL
TECNOLOG�A

REVISTAS SIMILARES

Applied Sciences
Journal of Low Power Electronics and Applications
Aerospace

DOI

https://doi.org/10.3390/a16060269

Art�culos similares

A Multi-Source Separation Approach Based on DOA Cue and DNN

Acceso

Yu Zhang, Maoshen Jia, Xinyu Jia and Tun-Wen Pai

Multiple sound source separation in a reverberant environment has become popular in recent years. To improve the quality of the separated signal in a reverberant environment, a separation method based on a DOA cue and a deep neural network (DNN) is propo... ver m�s

Revista: Applied Sciences

Adversarial Optimization-Based Knowledge Transfer of Layer-Wise Dense Flow for Image Classification

Acceso

Doyeob Yeo, Min-Suk Kim and Ji-Hoon Bae

A deep-learning technology for knowledge transfer is necessary to advance and optimize efficient knowledge distillation. Here, we aim to develop a new adversarial optimization-based knowledge transfer method involved with a layer-wise dense flow that is ... ver m�s

Revista: Applied Sciences

Data-Driven Approach for Rainfall-Runoff Modelling Using Equilibrium Optimizer Coupled Extreme Learning Machine and Deep Neural Network

Acceso

Bishwajit Roy, Maheshwari Prasad Singh, Mosbeh R. Kaloop, Deepak Kumar, Jong-Wan Hu, Radhikesh Kumar and Won-Sup Hwang

Rainfall-runoff (R-R) modelling is used to study the runoff generation of a catchment. The quantity or rate of change measure of the hydrological variable, called runoff, is important for environmental scientists to accomplish water-related planning and ... ver m�s

Revista: Applied Sciences

Remaining Useful Life Prediction Using Temporal Convolution with Attention

Acceso

Wei Ming Tan and T. Hui Teo

Prognostic techniques attempt to predict the Remaining Useful Life (RUL) of a subsystem or a component. Such techniques often use sensor data which are periodically measured and recorded into a time series data set. Such multivariate data sets form compl... ver m�s

Revista: AI

Augmented Latent Features of Deep Neural Network-Based Automatic Speech Recognition for Motor-Driven Robots

Acceso

Moa Lee and Joon-Hyuk Chang

Speech recognition for intelligent robots seems to suffer from performance degradation due to ego-noise. The ego-noise is caused by the motors, fans, and mechanical parts inside the intelligent robots especially when the robot moves or shakes its body. T... ver m�s

Revista: Applied Sciences

Revistas destacadas

Acceso directo a los n�meros publicados en la revista Infrastructures

Infrastructures

Acceso directo a los n�meros publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los n�meros publicados en la revista BiT

Acceso directo a los n�meros publicados en la revista Revista de la Construcci�n

Revista de la Construcci�n

Ver todas las revistas