Expected Scalarised Returns Dominance: A New Solution Concept for
Multi-Objective Decision Making
- URL: http://arxiv.org/abs/2106.01048v1
- Date: Wed, 2 Jun 2021 09:42:42 GMT
- Title: Expected Scalarised Returns Dominance: A New Solution Concept for
Multi-Objective Decision Making
- Authors: Conor F. Hayes, Timothy Verstraeten, Diederik M. Roijers, Enda Howley,
Patrick Mannion
- Abstract summary: In many real-world scenarios, the utility of a user is derived from the single execution of a policy.
To apply multi-objective reinforcement learning, the expected utility of the returns must be optimised.
We propose first-order dominance as a criterion to build solution sets to maximise expected utility.
We then define a new solution concept called the ESR set, which is a set of policies that are ESR dominant.
- Score: 4.117597517886004
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In many real-world scenarios, the utility of a user is derived from the
single execution of a policy. In this case, to apply multi-objective
reinforcement learning, the expected utility of the returns must be optimised.
Various scenarios exist where a user's preferences over objectives (also known
as the utility function) are unknown or difficult to specify. In such
scenarios, a set of optimal policies must be learned. However, settings where
the expected utility must be maximised have been largely overlooked by the
multi-objective reinforcement learning community and, as a consequence, a set
of optimal solutions has yet to be defined. In this paper we address this
challenge by proposing first-order stochastic dominance as a criterion to build
solution sets to maximise expected utility. We also propose a new dominance
criterion, known as expected scalarised returns (ESR) dominance, that extends
first-order stochastic dominance to allow a set of optimal policies to be
learned in practice. We then define a new solution concept called the ESR set,
which is a set of policies that are ESR dominant. Finally, we define a new
multi-objective distributional tabular reinforcement learning (MOT-DRL)
algorithm to learn the ESR set in a multi-objective multi-armed bandit setting.
Related papers
- Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes [7.028778922533688]
Average-reward Markov decision processes (MDPs) provide a foundational framework for sequential decision-making under uncertainty.
We study a unique structural property of average-reward MDPs and utilize it to introduce Reward-Extended Differential (or RED) reinforcement learning.
arXiv Detail & Related papers (2024-10-14T14:52:23Z) - Generative Slate Recommendation with Reinforcement Learning [49.75985313698214]
reinforcement learning algorithms can be used to optimize user engagement in recommender systems.
However, RL approaches are intractable in the slate recommendation scenario.
In that setting, an action corresponds to a slate that may contain any combination of items.
In this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder.
We are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates.
arXiv Detail & Related papers (2023-01-20T15:28:09Z) - Addressing the issue of stochastic environments and local
decision-making in multi-objective reinforcement learning [0.0]
Multi-objective reinforcement learning (MORL) is a relatively new field which builds on conventional Reinforcement Learning (RL)
This thesis focuses on what factors influence the frequency with which value-based MORL Q-learning algorithms learn the optimal policy for an environment.
arXiv Detail & Related papers (2022-11-16T04:56:42Z) - Multi-Objective Coordination Graphs for the Expected Scalarised Returns
with Generative Flow Models [2.7648976108201815]
Key to solving real-world problems is to exploit sparse dependency structures between agents.
In wind farm control a trade-off exists between maximising power and minimising stress on the systems components.
We model such sparse dependencies between agents as a multi-objective coordination graph (MO-CoG)
arXiv Detail & Related papers (2022-07-01T12:10:15Z) - Choosing the Best of Both Worlds: Diverse and Novel Recommendations
through Multi-Objective Reinforcement Learning [68.45370492516531]
We introduce Scalarized Multi-Objective Reinforcement Learning (SMORL) for the Recommender Systems (RS) setting.
SMORL agent augments standard recommendation models with additional RL layers that enforce it to simultaneously satisfy three principal objectives: accuracy, diversity, and novelty of recommendations.
Our experimental results on two real-world datasets reveal a substantial increase in aggregate diversity, a moderate increase in accuracy, reduced repetitiveness of recommendations, and demonstrate the importance of reinforcing diversity and novelty as complementary objectives.
arXiv Detail & Related papers (2021-10-28T13:22:45Z) - Policy Mirror Descent for Regularized Reinforcement Learning: A
Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL.
We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z) - Risk Aware and Multi-Objective Decision Making with Distributional Monte
Carlo Tree Search [3.487620847066216]
We propose an algorithm that learns a posterior distribution over the utility of the different possible returns attainable from individual policy executions.
Our algorithm outperforms the state-of-the-art in multi-objective reinforcement learning for the expected utility of the returns.
arXiv Detail & Related papers (2021-02-01T16:47:39Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives.
Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process.
We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z) - CRPO: A New Approach for Safe Reinforcement Learning with Convergence
Guarantee [61.176159046544946]
In safe reinforcement learning (SRL) problems, an agent explores the environment to maximize an expected total reward and avoids violation of certain constraints.
This is the first-time analysis of SRL algorithms with global optimal policies.
arXiv Detail & Related papers (2020-11-11T16:05:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.