Inicio  /  Algorithms  /  Vol: 17 Par: 1 (2024)  /  Artículo
ARTÍCULO
TITULO

Optimizing Reinforcement Learning Using a Generative Action-Translator Transformer

Jiaming Li    
Ning Xie and Tingting Zhao    

Resumen

In recent years, with the rapid advancements in Natural Language Processing (NLP) technologies, large models have become widespread. Traditional reinforcement learning algorithms have also started experimenting with language models to optimize training. However, they still fundamentally rely on the Markov Decision Process (MDP) for reinforcement learning, and do not fully exploit the advantages of language models for dealing with long sequences of problems. The Decision Transformer (DT) introduced in 2021 is the initial effort to completely transform the reinforcement learning problem into a challenge within the NLP domain. It attempts to use text generation techniques to create reinforcement learning trajectories, addressing the issue of finding optimal trajectories. However, the article places the training trajectory data of reinforcement learning directly into a basic language model for training. Its aim is to predict the entire trajectory, encompassing state and reward information. This approach deviates from the reinforcement learning training objective of finding the optimal action. Furthermore, it generates redundant information in the output, impacting the final training effectiveness of the agent. This paper proposes a more reasonable network model structure, the Action-Translator Transformer (ATT), to predict only the next action of the agent. This makes the language model more interpretable for the reinforcement learning problem. We test our model in simulated gaming scenarios and compare it with current mainstream methods in the offline reinforcement learning field. Based on the presented experimental results, our model demonstrates superior performance. We hope that introducing this model will inspire new ideas and solutions for combining language models and reinforcement learning, providing fresh perspectives for offline reinforcement learning research.

 Artículos similares

       
 
Yi Zhang, Lanxin Qiu, Yangzhou Xu, Xinjia Wang, Shengjie Wang, Agyemang Paul and Zhefu Wu    
Software-Defined Networking (SDN) enhances network control but faces Distributed Denial of Service (DDoS) attacks due to centralized control and flow-table constraints in network devices. To overcome this limitation, we introduce a multi-path routing alg... ver más
Revista: Applied Sciences

 
Xuan Liu and Daofang Chang    
In this paper, the essence and optimization objectives of the hull parts path optimization problem of CNC laser cutting are described, and the shortcomings of the existing optimization methods are pointed out. Based on the optimization problem of the hul... ver más

 
Mehrdad Hadizadeh-Bazaz, Ignacio J. Navarro and Víctor Yepes    
Recently, the repair and maintenance of structures has been necessary to prevent these structures? sudden collapse and to prevent human and financial damage. A natural factor in marine environments that destroys structures and reduces their life is the p... ver más

 
Ting Yao and Wei Li    
Mega land reclamation projects have been carried out on the coral reefs in the South China Sea. Coral sand was used as a backfill material through hydraulic filling, with fill heights ranging from 6 to 10 m. To enhance foundation stability, vibro-flotati... ver más

 
Ruijun Hu and Yulin Zhang    
The global path planning of planetary surface rovers is crucial for optimizing exploration benefits and system safety. For the cases of long-range roving or obstacle constraints that are time-varied, there is an urgent need to improve the computational e... ver más
Revista: Aerospace