Risk Aware and Multi-Objective Decision Making with Distributional Monte
Carlo Tree Search
- URL: http://arxiv.org/abs/2102.00966v2
- Date: Tue, 2 Feb 2021 14:06:18 GMT
- Title: Risk Aware and Multi-Objective Decision Making with Distributional Monte
Carlo Tree Search
- Authors: Conor F. Hayes, Mathieu Reymond, Diederik M. Roijers, Enda Howley,
Patrick Mannion
- Abstract summary: We propose an algorithm that learns a posterior distribution over the utility of the different possible returns attainable from individual policy executions.
Our algorithm outperforms the state-of-the-art in multi-objective reinforcement learning for the expected utility of the returns.
- Score: 3.487620847066216
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many risk-aware and multi-objective reinforcement learning settings, the
utility of the user is derived from the single execution of a policy. In these
settings, making decisions based on the average future returns is not suitable.
For example, in a medical setting a patient may only have one opportunity to
treat their illness. When making a decision, just the expected return -- known
in reinforcement learning as the value -- cannot account for the potential
range of adverse or positive outcomes a decision may have. Our key insight is
that we should use the distribution over expected future returns differently to
represent the critical information that the agent requires at decision time. In
this paper, we propose Distributional Monte Carlo Tree Search, an algorithm
that learns a posterior distribution over the utility of the different possible
returns attainable from individual policy executions, resulting in good
policies for both risk-aware and multi-objective settings. Moreover, our
algorithm outperforms the state-of-the-art in multi-objective reinforcement
learning for the expected utility of the returns.
Related papers
- Beyond Expected Return: Accounting for Policy Reproducibility when
Evaluating Reinforcement Learning Algorithms [9.649114720478872]
Many applications in Reinforcement Learning (RL) have noise ority present in the environment.
These uncertainties lead the exact same policy to perform differently, from one roll-out to another.
Common evaluation procedures in RL summarise the consequent return distributions using solely the expected return, which does not account for the spread of the distribution.
Our work defines this spread as the policy: the ability of a policy to obtain similar performance when rolled out many times, a crucial property in some real-world applications.
arXiv Detail & Related papers (2023-12-12T11:22:31Z) - Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks.
We study the problem from a model-based Bayesian reinforcement learning perspective.
We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z) - Risk-Sensitive Policy with Distributional Reinforcement Learning [4.523089386111081]
This research work introduces a novel methodology based on distributional RL to derive sequential decision-making policies sensitive to the risk.
Named the risk-based utility function $U$, it can be extracted from the random return distribution $Z$ naturally learnt by any distributional RL algorithm.
This enables to span the complete potential trade-off between risk minimisation and expected return maximisation.
arXiv Detail & Related papers (2022-12-30T14:37:28Z) - Quantile Off-Policy Evaluation via Deep Conditional Generative Learning [21.448553360543478]
Off-Policy evaluation (OPE) is concerned with evaluating a new target policy using offline data generated by a potentially different behavior policy.
We propose a doubly-robust inference procedure for quantile OPE in sequential decision making.
We demonstrate the advantages of this proposed estimator through both simulations and a real-world dataset from a short-video platform.
arXiv Detail & Related papers (2022-12-29T22:01:43Z) - Monte Carlo Tree Search Algorithms for Risk-Aware and Multi-Objective
Reinforcement Learning [2.3449131636069898]
In many risk-aware and multi-objective reinforcement learning settings, the utility of the user is derived from a single execution of a policy.
We propose two novel Monte Carlo tree search algorithms.
arXiv Detail & Related papers (2022-11-23T15:33:19Z) - Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk.
Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z) - Expected Scalarised Returns Dominance: A New Solution Concept for
Multi-Objective Decision Making [4.117597517886004]
In many real-world scenarios, the utility of a user is derived from the single execution of a policy.
To apply multi-objective reinforcement learning, the expected utility of the returns must be optimised.
We propose first-order dominance as a criterion to build solution sets to maximise expected utility.
We then define a new solution concept called the ESR set, which is a set of policies that are ESR dominant.
arXiv Detail & Related papers (2021-06-02T09:42:42Z) - Universal Off-Policy Evaluation [64.02853483874334]
We take the first steps towards a universal off-policy estimator (UnO)
We use UnO for estimating and simultaneously bounding the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns.
arXiv Detail & Related papers (2021-04-26T18:54:31Z) - Information Directed Reward Learning for Reinforcement Learning [64.33774245655401]
We learn a model of the reward function that allows standard RL algorithms to achieve high expected return with as few expert queries as possible.
In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types.
We support our findings with extensive evaluations in multiple environments and with different types of queries.
arXiv Detail & Related papers (2021-02-24T18:46:42Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z) - Randomized Entity-wise Factorization for Multi-Agent Reinforcement
Learning [59.62721526353915]
Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities.
Our method aims to leverage these commonalities by asking the question: What is the expected utility of each agent when only considering a randomly selected sub-group of its observed entities?''
arXiv Detail & Related papers (2020-06-07T18:28:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.