Hierarchical DDPG for Manipulator Motion Planning in Dynamic Environments

Dugan Um

Prasad Nethala and Hocheol Shin

Resumen

In this paper, a hierarchical reinforcement learning (HRL) architecture, namely a ?Hierarchical Deep Deterministic Policy Gradient (HDDPG)? has been proposed and studied. A HDDPG utilizes manager and worker formation similar to other HRL structures. However, unlike others, the HDDPG enables sharing an identical environment and state among workers and managers, while a unique reward system is required for each Deep Deterministic Policy Gradient (DDPG) agent. Therefore, the HDDPG allows easy structural expansion with probabilistic action selection of a worker by the manager. Due to its innate structural advantage, the HDDPG has a merit in building a general AI to deal with a complex time-horizon tasks with various conflicting sub-goals. The experimental results demonstrated its usefulness with a manipulator motion planning problem in a dynamic environment, where path planning and collision avoidance conflict each other. The proposed HDDPG is compared with an HAM and a single DDPG for performance evaluation. The result shows that the HDDPG demonstrated more than 40% of reward gain and more than two times the reward improvement rate. Another important feature of the proposed HDDPG is the biased manager training capability. By adding a preference factor to each worker, the manager can be trained to prefer a certain worker to achieve better success rate for a specific objective if needed.

Palabras claves

RL reinforcement learning - HRL hierarchical reinforcement learning - DDPG Deep Deterministic Policy Gradient - HDDPG Hierarchical Deep Deterministic Policy Gradient - HAM Hierarchical Abstract Machines

Acceso

P�GINAS

pp. 0 - 0

N�MERO

Volumen: 3 Parte: 3 (2022)

MATERIAS

INGENIER�A Y CONSTRUCCI�N CIVIL
TECNOLOG�A

DOI

https://doi.org/10.3390/ai3030037

Hierarchical DDPG for Manipulator Motion Planning in Dynamic Environments

Revistas destacadas