Inicio  /  Algorithms  /  Vol: 16 Par: 7 (2023)  /  Artículo
ARTÍCULO
TITULO

Risk-Sensitive Policy with Distributional Reinforcement Learning

Thibaut Théate and Damien Ernst    

Resumen

Classical reinforcement learning (RL) techniques are generally concerned with the design of decision-making policies driven by the maximisation of the expected outcome. Nevertheless, this approach does not take into consideration the potential risk associated with the actions taken, which may be critical in certain applications. To address that issue, the present research work introduces a novel methodology based on distributional RL to derive sequential decision-making policies that are sensitive to the risk, the latter being modelled by the tail of the return probability distribution. The core idea is to replace the Q function generally standing at the core of learning schemes in RL by another function, taking into account both the expected return and the risk. Named the risk-based utility function U, it can be extracted from the random return distribution Z naturally learnt by any distributional RL algorithm. This enables the spanning of the complete potential trade-off between risk minimisation and expected return maximisation, in contrast to fully risk-averse methodologies. Fundamentally, this research yields a truly practical and accessible solution for learning risk-sensitive policies with minimal modification to the distributional RL algorithm, with an emphasis on the interpretability of the resulting decision-making process.

 Artículos similares

       
 
M. Fikret Ercan and Ricky Ben Wang    
Recently computer vision has been applied in various fields of engineering successfully ranging from manufacturing to autonomous cars. A key player in this development is the achievements of the latest object detection and classification architectures. I... ver más
Revista: Computers

 
Xiaoxiong Liu, Yi Yin, Yuzhan Su and Ruichen Ming    
To solve the problems of autonomous decision making and the cooperative operation of multiple unmanned combat aerial vehicles (UCAVs) in beyond-visual-range air combat, this paper proposes an air combat decision-making method that is based on a multi-age... ver más
Revista: Aerospace

 
Dominic Lagrois, Tyler R. Bonnell, Ankita Shukla and Clément Chion    
Agent-based models return spatiotemporal information used to process time series of specific parameters for specific individuals called ?agents?. For complex, advanced and detailed models, this typically comes at the expense of high computing times and r... ver más

 
Zexin Hu, Yiqi Zhao and Matloob Khushi    
Predictions of stock and foreign exchange (Forex) have always been a hot and profitable area of study. Deep learning applications have been proven to yield better accuracy and return in the field of financial prediction and forecasting. In this survey, w... ver más

 
Moussa Diallo, Shengwu Xiong, Eshete Derb Emiru, Awet Fesseha, Aminu Onimisi Abdulsalami and Mohamed Abd Elaziz    
Classification algorithms have shown exceptional prediction results in the supervised learning area. These classification algorithms are not always efficient when it comes to real-life datasets due to class distributions. As a result, datasets for real-l... ver más
Revista: Information