Inicio  /  Applied Sciences  /  Vol: 14 Par: 6 (2024)  /  Artículo
ARTÍCULO
TITULO

A Unified Visual and Linguistic Semantics Method for Enhanced Image Captioning

Jiajia Peng and Tianbing Tang    

Resumen

Image captioning, also recognized as the challenge of transforming visual data into coherent natural language descriptions, has persisted as a complex problem. Traditional approaches often suffer from semantic gaps, wherein the generated textual descriptions lack depth, context, or the nuanced relationships contained within the images. In an effort to overcome these limitations, we introduce a novel encoder?decoder framework called A Unified Visual and Linguistic Semantics Method. Our method comprises three key components: an encoder, a mapping network, and a decoder. The encoder employs a fusion of CLIP (Contrastive Language?Image Pre-training) and SegmentCLIP to process and extract salient image features. SegmentCLIP builds upon CLIP?s foundational architecture by employing a clustering mechanism, thereby enhancing the semantic relationships between textual and visual elements in the image. The extracted features are then transformed by a mapping network into a fixed-length prefix. A GPT-2-based decoder subsequently generates a corresponding Chinese language description for the image. This framework aims to harmonize feature extraction and semantic enrichment, thereby producing more contextually accurate and comprehensive image descriptions. Our quantitative assessment reveals that our model exhibits notable enhancements across the intricate AIC-ICC, Flickr8k-CN, and COCO-CN datasets, evidenced by a 2% improvement in BLEU@4 and a 10% uplift in CIDEr scores. Additionally, it demonstrates acceptable efficiency in terms of simplicity, speed, and reduction in computational burden.

 Artículos similares

       
 
Luis Pastor, Sofia Bayona, Juan P. Brito, María Cuevas, Isabel Fernaud, Sergio Emilio Galindo, Juan José García-Cantero, Francisco González de Quevedo, Susana Mata, Oscar David Robles, Angel Rodríguez, Pablo Toharia and Ana Zdravkovic    
The complexity of the human brain makes its understanding one of the biggest challenges that science is currently confronting. Due to its complexity, the brain has been studied at many different levels and from many disciplines and points of view, using ... ver más
Revista: Applied Sciences

 
Breno Lisi Romano, Adilson Marques da Cunha (Author)     Pág. e38349
This paper presents, as its main contribution, a Framework for Web Applications named Agile and Collaborative Model Driven Development (AC-MDD). It aims to increase productivity by generating source-codes from agile models. It tackles the waste reduction... ver más

 
Song Zheng, Chao Bi and Yilin Song    
This paper presents a novel diagonal recurrent neural network hybrid controller based on the shared memory of real-time database structure. The controller uses Data Engine (DE) technology, through the establishment of a unified and standardized software ... ver más
Revista: Algorithms

 
Stanislav Velykodniy     Pág. 62 - 71
Computer Graphics is an up-to-date industry in the design and application of rapidly evolving computing systems. The subject of the work is designing a graphical user interface. The purpose of the work is to perform reengineering (evolutionary improvemen... ver más

 
Stanislav Velykodniy, Zhanna Burlachenko, Svitlana Zaitseva-Velykodna     Pág. 25 - 35
Subject of the research is a software tool for construction of graphic network model of reengineering the software project. Purpose of the research is the development of technical architecture of software tool for automated design of network schedules fo... ver más