Content-Based Video Big Data Retrieval with Extensive Features and Deep Learning

Thuong-Cang Phan

Anh-Cang Phan

Hung-Phi Cao and Thanh-Ngoan Trieu

Resumen

In the era of digital media, the rapidly increasing volume and complexity of multimedia data cause many problems in storing, processing, and querying information in a reasonable time. Feature extraction and processing time play an extremely important role in large-scale video retrieval systems and currently receive much attention from researchers. We, therefore, propose an efficient approach to feature extraction on big video datasets using deep learning techniques. It focuses on the main features, including subtitles, speeches, and objects in video frames, by using a combination of three techniques: optical character recognition (OCR), automatic speech recognition (ASR), and object identification with deep learning techniques. We provide three network models developed from networks of Faster R-CNN ResNet, Faster R-CNN Inception ResNet V2, and Single Shot Detector MobileNet V2. The approach is implemented in Spark, the next-generation parallel and distributed computing environment, which reduces the time and space costs of the feature extraction process. Experimental results show that our proposal achieves an accuracy of 96% and a processing time reduction of 50%. This demonstrates the feasibility of the approach for content-based video retrieval systems in a big data context.

Palabras claves

video retrieval - deep learning - spark - big data - video content extraction - content-based video big data

Acceso

P�GINAS

pp. 0 - 0

N�MERO

Volumen: 12 Parte: 13 (2022)

MATERIAS

INGENIER�A Y CONSTRUCCI�N CIVIL
TECNOLOG�A

REVISTAS SIMILARES

Applied Sciences
Ci�ncia da Informa��o
IEEE JOURNAL OF SELECTED AREAS IN COMMUNICATIONS

DOI

https://doi.org/10.3390/app12136753