Inicio  /  Algorithms  /  Vol: 17 Par: 1 (2024)  /  Artículo
ARTÍCULO
TITULO

Reducing Q-Value Estimation Bias via Mutual Estimation and Softmax Operation in MADRL

Zheng Li    
Xinkai Chen    
Jiaqing Fu    
Ning Xie and Tingting Zhao    

Resumen

With the development of electronic game technology, the content of electronic games presents a larger number of units, richer unit attributes, more complex game mechanisms, and more diverse team strategies. Multi-agent deep reinforcement learning shines brightly in this type of team electronic game, achieving results that surpass professional human players. Reinforcement learning algorithms based on Q-value estimation often suffer from Q-value overestimation, which may seriously affect the performance of AI in multi-agent scenarios. We propose a multi-agent mutual evaluation method and a multi-agent softmax method to reduce the estimation bias of Q values in multi-agent scenarios, and have tested them in both the particle multi-agent environment and the multi-agent tank environment we constructed. The multi-agent tank environment we have built has achieved a good balance between experimental verification efficiency and multi-agent game task simulation. It can be easily extended for different multi-agent cooperation or competition tasks. We hope that it can be promoted in the research of multi-agent deep reinforcement learning.

 Artículos similares

       
 
Seung Gook Cha, Donghyun Kim and Young Joong Yoon    
In this paper, a compact direction-finding system based on a deep neural network (DNN) with a single-patch multi-beam antenna is proposed. To achieve multiple beams, the patch is divided into four sectors by metal vias, and the pattern is tilted in the t... ver más
Revista: Applied Sciences

 
Fang-Le Peng, Yong-Kang Qiao and Chao Yang    
Safety issues are a major concern for the long-term maintenance and operation of utility tunnels, of which the focal point lies in the reliability of critical facilities. Conventional evaluation methods have failed to reflect the time-dependency and obje... ver más
Revista: Applied Sciences

 
Fengzhong Qu, Zhengchao Li, Minhao Zhang, Xingbin Tu and Yan Wei    
Wireless communication at sea is an essential way to establish a smart ocean. In the communication system, however, signals are affected by the carrier frequency offset (CFO), which results from the Doppler effect and crystal frequency offset. The offset... ver más

 
Wanlu Zhu, Tianwen Gu, Jie Wu and Zhengzhuo Liang    
In instances where vessels encounter impacts or other factors leading to communication impairments, the status of electrical equipment becomes inaccessible through standard communication lines for the controllers. Consequently, the shipboard power system... ver más

 
Yanqiu Gao    
The ensemble Kalman filter is often used in parameter estimation, which plays an essential role in reducing model errors. However, filter divergence is often encountered in an estimation process, resulting in the convergence of parameters to the improper... ver más