Mean-Variance Efficient Reinforcement Learning by Expected Quadratic
Utility Maximization
- URL: http://arxiv.org/abs/2010.01404v3
- Date: Sun, 5 Sep 2021 10:28:58 GMT
- Title: Mean-Variance Efficient Reinforcement Learning by Expected Quadratic
Utility Maximization
- Authors: Masahiro Kato and Kei Nakagawa and Kenshi Abe and Tetsuro Morimura
- Abstract summary: In this paper, we consider learning efficient policies that achieve efficiency regarding MV trade-off.
To achieve this purpose, we train an agent to maximize the expected quadratic utility function.
- Score: 9.902494567482597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Risk management is critical in decision making, and mean-variance (MV)
trade-off is one of the most common criteria. However, in reinforcement
learning (RL) for sequential decision making under uncertainty, most of the
existing methods for MV control suffer from computational difficulties caused
by the double sampling problem. In this paper, in contrast to strict MV
control, we consider learning MV efficient policies that achieve Pareto
efficiency regarding MV trade-off. To achieve this purpose, we train an agent
to maximize the expected quadratic utility function, a common objective of risk
management in finance and economics. We call our approach direct expected
quadratic utility maximization (EQUM). The EQUM does not suffer from the double
sampling issue because it does not include gradient estimation of variance. We
confirm that the maximizer of the objective in the EQUM directly corresponds to
an MV efficient policy under a certain condition. We conduct experiments with
benchmark settings to demonstrate the effectiveness of the EQUM.
Related papers
- Multi-Agent Imitation Learning: Value is Easy, Regret is Hard [52.31989962031179]
We study a multi-agent imitation learning (MAIL) problem where we take the perspective of a learner attempting to coordinate a group of agents.
Most prior work in MAIL essentially reduces the problem to matching the behavior of the expert within the support of the demonstrations.
While doing so is sufficient to drive the value gap between the learner and the expert to zero under the assumption that agents are non-strategic, it does not guarantee to deviations by strategic agents.
arXiv Detail & Related papers (2024-06-06T16:18:20Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Mimicking Better by Matching the Approximate Action Distribution [48.95048003354255]
We introduce MAAD, a novel, sample-efficient on-policy algorithm for Imitation Learning from Observations.
We show that it requires considerable fewer interactions to achieve expert performance, outperforming current state-of-the-art on-policy methods.
arXiv Detail & Related papers (2023-06-16T12:43:47Z) - Risk-Aware Distributed Multi-Agent Reinforcement Learning [8.287693091673658]
We develop a distributed MARL approach to solve decision-making problems in unknown environments by learning risk-aware actions.
We then propose a distributed MARL algorithm called the CVaR QD-Learning algorithm, and establish that value functions of individual agents reaches consensus.
arXiv Detail & Related papers (2023-04-04T17:56:44Z) - Latent State Marginalization as a Low-cost Approach for Improving
Exploration [79.12247903178934]
We propose the adoption of latent variable policies within the MaxEnt framework.
We show that latent variable policies naturally emerges under the use of world models with a latent belief state.
We experimentally validate our method on continuous control tasks, showing that effective marginalization can lead to better exploration and more robust training.
arXiv Detail & Related papers (2022-10-03T15:09:12Z) - Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement
Learning [12.022303947412917]
This paper aims at optimizing the mean-semivariance criterion in reinforcement learning w.r.t. steady rewards.
We reveal that the MSV problem can be solved by iteratively solving a sequence of RL problems with a policy-dependent reward function.
We propose two on-policy algorithms based on the policy gradient theory and the trust region method.
arXiv Detail & Related papers (2022-06-15T08:32:53Z) - On the Equity of Nuclear Norm Maximization in Unsupervised Domain
Adaptation [53.29437277730871]
Nuclear norm has shown the power to enhance the transferability of unsupervised domain adaptation model.
Two new losses are proposed to maximize both predictive discriminability and equity, from the class level and the sample level.
arXiv Detail & Related papers (2022-04-12T07:55:47Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.