|
|
|
Ziyi Wang, Xinran Li, Luoyang Sun, Haifeng Zhang, Hualin Liu and Jun Wang
Efficient yet sufficient exploration remains a critical challenge in reinforcement learning (RL), especially for Markov Decision Processes (MDPs) with vast action spaces. Previous approaches have commonly involved projecting the original action space int...
ver más
|
|
|