Inicio  /  Applied Sciences  /  Vol: 13 Par: 13 (2023)  /  Artículo
ARTÍCULO
TITULO

Bi-LS-AttM: A Bidirectional LSTM and Attention Mechanism Model for Improving Image Captioning

Tian Xie    
Weiping Ding    
Jinbao Zhang    
Xusen Wan and Jiehua Wang    

Resumen

The discipline of automatic image captioning represents an integration of two pivotal branches of artificial intelligence, namely computer vision (CV) and natural language processing (NLP). The principal functionality of this technology lies in transmuting the extracted visual features into semantic information of a higher order. The bidirectional long short-term memory (Bi-LSTM) has garnered wide acceptance in executing image captioning tasks. Of late, scholarly attention has been focused on modifying suitable models for innovative and precise subtitle captions, although tuning the parameters of the model does not invariably yield optimal outcomes. Given this, the current research proposes a model that effectively employs the bidirectional LSTM and attention mechanism (Bi-LS-AttM) for image captioning endeavors. This model exploits the contextual comprehension from both anterior and posterior aspects of the input data, synergistically with the attention mechanism, thereby augmenting the precision of visual language interpretation. The distinctiveness of this research is embodied in its incorporation of Bi-LSTM and the attention mechanism to engender sentences that are both structurally innovative and accurately reflective of the image content. To enhance temporal efficiency and accuracy, this study substitutes convolutional neural networks (CNNs) with fast region-based convolutional networks (Fast RCNNs). Additionally, it refines the process of generation and evaluation of common space, thus fostering improved efficiency. Our model was tested for its performance on Flickr30k and MSCOCO datasets (80 object categories). Comparative analyses of performance metrics reveal that our model, leveraging the Bi-LS-AttM, surpasses unidirectional and Bi-LSTM models. When applied to caption generation and image-sentence retrieval tasks, our model manifests time economies of approximately 36.5% and 26.3% vis-a-vis the Bi-LSTM model and the deep Bi-LSTM model, respectively.

 Artículos similares

       
 
Mostafa Aliyari and Yonas Zewdu Ayele    
This article aims to assess the effectiveness of state-of-the-art artificial neural network (ANN) models in time series analysis, specifically focusing on their application in prediction tasks of critical infrastructures (CIs). To accomplish this, shallo... ver más

 
Sakib Shahriar, Noora Al Roken and Imran Zualkernan    
The automatic classification of poems into various categories, such as by author or era, is an interesting problem. However, most current work categorizing Arabic poems into eras or emotions has utilized traditional feature engineering and machine learni... ver más
Revista: Computers

 
Ridwan Ilyas, Masayu Leylia Khodra, Rinaldi Munir, Rila Mandala and Dwi Hendratmo Widyantoro    
The paraphrase generator for citation sentences is used to produce several sentence alternatives to avoid plagiarism. Furthermore, the generation results need to pay attention to semantic similarity and lexical divergence standards. This study proposed t... ver más
Revista: Informatics

 
Hossein Shahverdi, Mohammad Nabati, Parisa Fard Moshiri, Reza Asvadi and Seyed Ali Ghorashi    
Human Activity Recognition (HAR) has been a popular area of research in the Internet of Things (IoT) and Human?Computer Interaction (HCI) over the past decade. The objective of this field is to detect human activities through numeric or visual representa... ver más
Revista: Information

 
Ayushi Chahal, Preeti Gulia, Nasib Singh Gill and Ishaani Priyadarshini    
IoT devices collect time-series traffic data, which is stochastic and complex in nature. Traffic flow prediction is a thorny task using this kind of data. A smart traffic congestion prediction system is a need of sustainable and economical smart cities. ... ver más
Revista: Information