REVISTA
AI

   
Inicio  /  AI  /  Vol: 3 Par: 3 (2022)  /  Artículo
ARTÍCULO
TITULO

Hierarchical DDPG for Manipulator Motion Planning in Dynamic Environments

Dugan Um    
Prasad Nethala and Hocheol Shin    

Resumen

In this paper, a hierarchical reinforcement learning (HRL) architecture, namely a ?Hierarchical Deep Deterministic Policy Gradient (HDDPG)? has been proposed and studied. A HDDPG utilizes manager and worker formation similar to other HRL structures. However, unlike others, the HDDPG enables sharing an identical environment and state among workers and managers, while a unique reward system is required for each Deep Deterministic Policy Gradient (DDPG) agent. Therefore, the HDDPG allows easy structural expansion with probabilistic action selection of a worker by the manager. Due to its innate structural advantage, the HDDPG has a merit in building a general AI to deal with a complex time-horizon tasks with various conflicting sub-goals. The experimental results demonstrated its usefulness with a manipulator motion planning problem in a dynamic environment, where path planning and collision avoidance conflict each other. The proposed HDDPG is compared with an HAM and a single DDPG for performance evaluation. The result shows that the HDDPG demonstrated more than 40% of reward gain and more than two times the reward improvement rate. Another important feature of the proposed HDDPG is the biased manager training capability. By adding a preference factor to each worker, the manager can be trained to prefer a certain worker to achieve better success rate for a specific objective if needed.