REVISTA
Applied Sciences

TODAS

Inicio / Applied Sciences / Vol: 13 Par: 13 (2023) / Art�culo

ART�CULO

TITULO

Bi-LS-AttM: A Bidirectional LSTM and Attention Mechanism Model for Improving Image Captioning

Tian Xie

Weiping Ding

Jinbao Zhang

Xusen Wan and Jiehua Wang

Resumen

The discipline of automatic image captioning represents an integration of two pivotal branches of artificial intelligence, namely computer vision (CV) and natural language processing (NLP). The principal functionality of this technology lies in transmuting the extracted visual features into semantic information of a higher order. The bidirectional long short-term memory (Bi-LSTM) has garnered wide acceptance in executing image captioning tasks. Of late, scholarly attention has been focused on modifying suitable models for innovative and precise subtitle captions, although tuning the parameters of the model does not invariably yield optimal outcomes. Given this, the current research proposes a model that effectively employs the bidirectional LSTM and attention mechanism (Bi-LS-AttM) for image captioning endeavors. This model exploits the contextual comprehension from both anterior and posterior aspects of the input data, synergistically with the attention mechanism, thereby augmenting the precision of visual language interpretation. The distinctiveness of this research is embodied in its incorporation of Bi-LSTM and the attention mechanism to engender sentences that are both structurally innovative and accurately reflective of the image content. To enhance temporal efficiency and accuracy, this study substitutes convolutional neural networks (CNNs) with fast region-based convolutional networks (Fast RCNNs). Additionally, it refines the process of generation and evaluation of common space, thus fostering improved efficiency. Our model was tested for its performance on Flickr30k and MSCOCO datasets (80 object categories). Comparative analyses of performance metrics reveal that our model, leveraging the Bi-LS-AttM, surpasses unidirectional and Bi-LSTM models. When applied to caption generation and image-sentence retrieval tasks, our model manifests time economies of approximately 36.5% and 26.3% vis-a-vis the Bi-LSTM model and the deep Bi-LSTM model, respectively.

Palabras claves

image captioning - bidirectional long short-term memory - attention mechanism - fast region-based convolutional network - common space

Acceso

P�GINAS

pp. 0 - 0

N�MERO

Volumen: 13 Parte: 13 (2023)

MATERIAS

INGENIER�A Y CONSTRUCCI�N CIVIL
TECNOLOG�A

REVISTAS SIMILARES

Applied Sciences
Information
Applied System Innovation

DOI

https://doi.org/10.3390/app13137916

Art�culos similares

Application of Artificial Neural Networks for Power Load Prediction in Critical Infrastructure: A Comparative Case Study

Acceso

Mostafa Aliyari and Yonas Zewdu Ayele

This article aims to assess the effectiveness of state-of-the-art artificial neural network (ANN) models in time series analysis, specifically focusing on their application in prediction tasks of critical infrastructures (CIs). To accomplish this, shallo... ver m�s

Revista: Applied System Innovation

Classification of Arabic Poetry Emotions Using Deep Learning

Acceso

Sakib Shahriar, Noora Al Roken and Imran Zualkernan

The automatic classification of poems into various categories, such as by author or era, is an interesting problem. However, most current work categorizing Arabic poems into eras or emotions has utilized traditional feature engineering and machine learni... ver m�s

Revista: Computers

Generating Paraphrase Using Simulated Annealing for Citation Sentences

Acceso

Ridwan Ilyas, Masayu Leylia Khodra, Rinaldi Munir, Rila Mandala and Dwi Hendratmo Widyantoro

The paraphrase generator for citation sentences is used to produce several sentence alternatives to avoid plagiarism. Furthermore, the generation results need to pay attention to semantic similarity and lexical divergence standards. This study proposed t... ver m�s

Revista: Informatics

Enhancing CSI-Based Human Activity Recognition by Edge Detection Techniques

Acceso

Hossein Shahverdi, Mohammad Nabati, Parisa Fard Moshiri, Reza Asvadi and Seyed Ali Ghorashi

Human Activity Recognition (HAR) has been a popular area of research in the Internet of Things (IoT) and Human?Computer Interaction (HCI) over the past decade. The objective of this field is to detect human activities through numeric or visual representa... ver m�s

Revista: Information

A Hybrid Univariate Traffic Congestion Prediction Model for IoT-Enabled Smart City

Acceso

Ayushi Chahal, Preeti Gulia, Nasib Singh Gill and Ishaani Priyadarshini

IoT devices collect time-series traffic data, which is stochastic and complex in nature. Traffic flow prediction is a thorny task using this kind of data. A smart traffic congestion prediction system is a need of sustainable and economical smart cities. ... ver m�s

Revista: Information

Revistas destacadas

Acceso directo a los n�meros publicados en la revista Infrastructures

Infrastructures

Acceso directo a los n�meros publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los n�meros publicados en la revista BiT

Acceso directo a los n�meros publicados en la revista Revista de la Construcci�n

Revista de la Construcci�n

Ver todas las revistas disponibles