A Unified Visual and Linguistic Semantics Method for Enhanced Image Captioning

Jiajia Peng and Tianbing Tang

Resumen

Image captioning, also recognized as the challenge of transforming visual data into coherent natural language descriptions, has persisted as a complex problem. Traditional approaches often suffer from semantic gaps, wherein the generated textual descriptions lack depth, context, or the nuanced relationships contained within the images. In an effort to overcome these limitations, we introduce a novel encoder?decoder framework called A Unified Visual and Linguistic Semantics Method. Our method comprises three key components: an encoder, a mapping network, and a decoder. The encoder employs a fusion of CLIP (Contrastive Language?Image Pre-training) and SegmentCLIP to process and extract salient image features. SegmentCLIP builds upon CLIP?s foundational architecture by employing a clustering mechanism, thereby enhancing the semantic relationships between textual and visual elements in the image. The extracted features are then transformed by a mapping network into a fixed-length prefix. A GPT-2-based decoder subsequently generates a corresponding Chinese language description for the image. This framework aims to harmonize feature extraction and semantic enrichment, thereby producing more contextually accurate and comprehensive image descriptions. Our quantitative assessment reveals that our model exhibits notable enhancements across the intricate AIC-ICC, Flickr8k-CN, and COCO-CN datasets, evidenced by a 2% improvement in BLEU@4 and a 10% uplift in CIDEr scores. Additionally, it demonstrates acceptable efficiency in terms of simplicity, speed, and reduction in computational burden.

Palabras claves

image captioning - image features - clustering mechanism - Chinese language description

Acceso

P�GINAS

pp. 0 - 0

N�MERO

Volumen: 14 Parte: 6 (2024)

MATERIAS

INGENIER�A Y CONSTRUCCI�N CIVIL
TECNOLOG�A

REVISTAS SIMILARES

Acta Scientiarum: Technology
Information
Applied Sciences

DOI

https://doi.org/10.3390/app14062657

Art�culos similares

A Unified Framework for Neuroscience Morphological Data Visualization

Acceso

Luis Pastor, Sofia Bayona, Juan P. Brito, Mar�a Cuevas, Isabel Fernaud, Sergio Emilio Galindo, Juan Jos� Garc�a-Cantero, Francisco Gonz�lez de Quevedo, Susana Mata, Oscar David Robles, Angel Rodr�guez, Pablo Toharia and Ana Zdravkovic

The complexity of the human brain makes its understanding one of the biggest challenges that science is currently confronting. Due to its complexity, the brain has been studied at many different levels and from many disciplines and points of view, using ... ver m�s

Revista: Applied Sciences

A Framework for Web Applications using an Agile and Collaborative Model Driven Development (AC-MDD)

Acceso

Breno Lisi Romano, Adilson Marques da Cunha (Author) P�g. e38349

This paper presents, as its main contribution, a Framework for Web Applications named Agile and Collaborative Model Driven Development (AC-MDD). It aims to increase productivity by generating source-codes from agile models. It tackles the waste reduction... ver m�s

Revista: Acta Scientiarum: Technology

A New Method of Applying Data Engine Technology to Realize Neural Network Control

Acceso

Song Zheng, Chao Bi and Yilin Song

This paper presents a novel diagonal recurrent neural network hybrid controller based on the shared memory of real-time database structure. The controller uses Data Engine (DE) technology, through the establishment of a unified and standardized software ... ver m�s

Revista: Algorithms

REENGINEERING OF OPEN SOFTWARE SYSTEM OF 3D MODELING BRL-CAD

Acceso

Stanislav Velykodniy P�g. 62 - 71

Computer Graphics is an up-to-date industry in the design and application of rapidly evolving computing systems. The subject of the work is designing a graphical user interface. The purpose of the work is to perform reengineering (evolutionary improvemen... ver m�s

Revista: Innovative technologies and scientific solutions for industries

ARCHITECTURE DEVELOPMENT OF SOFTWARE FOR MANAGING NETWORK PLANNING OF SOFTWARE PROJECT REENGINEERING

Acceso

Stanislav Velykodniy, Zhanna Burlachenko, Svitlana Zaitseva-Velykodna P�g. 25 - 35

Subject of the research is a software tool for construction of graphic network model of reengineering the software project. Purpose of the research is the development of technical architecture of software tool for automated design of network schedules fo... ver m�s

Revista: Innovative technologies and scientific solutions for industries

Revistas destacadas

Acceso directo a los n�meros publicados en la revista Infrastructures

Infrastructures

Acceso directo a los n�meros publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los n�meros publicados en la revista BiT

Acceso directo a los n�meros publicados en la revista Revista de la Construcci�n

Revista de la Construcci�n

Ver todas las revistas disponibles