Pitfall of Optimism: Distributional Reinforcement Learning by
Randomizing Risk Criterion
- URL: http://arxiv.org/abs/2310.16546v3
- Date: Tue, 5 Dec 2023 05:14:37 GMT
- Title: Pitfall of Optimism: Distributional Reinforcement Learning by
Randomizing Risk Criterion
- Authors: Taehyun Cho, Seungyub Han, Heesoo Lee, Kyungjae Lee, Jungwoo Lee
- Abstract summary: We present a novel distributional reinforcement learning algorithm that selects actions by randomizing risk criterion to avoid one-sided tendency on risk.
Our theoretical results support that the proposed method does not fall into biased exploration and is guaranteed to converge to an optimal return.
- Score: 9.35556128467037
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Distributional reinforcement learning algorithms have attempted to utilize
estimated uncertainty for exploration, such as optimism in the face of
uncertainty. However, using the estimated variance for optimistic exploration
may cause biased data collection and hinder convergence or performance. In this
paper, we present a novel distributional reinforcement learning algorithm that
selects actions by randomizing risk criterion to avoid one-sided tendency on
risk. We provide a perturbed distributional Bellman optimality operator by
distorting the risk measure and prove the convergence and optimality of the
proposed method with the weaker contraction property. Our theoretical results
support that the proposed method does not fall into biased exploration and is
guaranteed to converge to an optimal return. Finally, we empirically show that
our method outperforms other existing distribution-based algorithms in various
environments including Atari 55 games.
Related papers
- Uncertainty Quantification via Stable Distribution Propagation [60.065272548502]
We propose a new approach for propagating stable probability distributions through neural networks.
Our method is based on local linearization, which we show to be an optimal approximation in terms of total variation distance for the ReLU non-linearity.
arXiv Detail & Related papers (2024-02-13T09:40:19Z) - Bayesian Nonparametrics Meets Data-Driven Distributionally Robust Optimization [29.24821214671497]
Training machine learning and statistical models often involve optimizing a data-driven risk criterion.
We propose a novel robust criterion by combining insights from Bayesian nonparametric (i.e., Dirichlet process) theory and a recent decision-theoretic model of smooth ambiguity-averse preferences.
For practical implementation, we propose and study tractable approximations of the criterion based on well-known Dirichlet process representations.
arXiv Detail & Related papers (2024-01-28T21:19:15Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Domain Generalization without Excess Empirical Risk [83.26052467843725]
A common approach is designing a data-driven surrogate penalty to capture generalization and minimize the empirical risk jointly with the penalty.
We argue that a significant failure mode of this recipe is an excess risk due to an erroneous penalty or hardness in joint optimization.
We present an approach that eliminates this problem. Instead of jointly minimizing empirical risk with the penalty, we minimize the penalty under the constraint of optimality of the empirical risk.
arXiv Detail & Related papers (2023-08-30T08:46:46Z) - Model-Based Uncertainty in Value Functions [89.31922008981735]
We focus on characterizing the variance over values induced by a distribution over MDPs.
Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation.
We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values.
arXiv Detail & Related papers (2023-02-24T09:18:27Z) - Stochastic Optimization for Spectral Risk Measures [5.55979411072702]
Spectral risk objectives allow learning systems to interpolate between optimizing average-case performance (as in empirical risk minimization) and worst-case performance on a task.
We develop algorithms to optimize these quantities by characterizing their subdifferential and addressing challenges such as biasedness of subgradient estimates and non-smoothness of the objective.
arXiv Detail & Related papers (2022-12-10T00:03:12Z) - A Risk-Sensitive Approach to Policy Optimization [21.684251937825234]
Standard deep reinforcement learning (DRL) aims to maximize expected reward, considering collected experiences equally in formulating a policy.
We propose a more direct approach whereby risk-sensitive objectives, specified in terms of the cumulative distribution function (CDF) of the distribution of full-episode rewards, are optimized.
We demonstrate that the use of moderately "pessimistic" risk profiles, which emphasize scenarios where the agent performs poorly, leads to enhanced exploration and a continual focus on addressing deficiencies.
arXiv Detail & Related papers (2022-08-19T00:55:05Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - A Stochastic Subgradient Method for Distributionally Robust Non-Convex
Learning [2.007262412327553]
robustness is with respect to uncertainty in the underlying data distribution.
We show that our technique converges to satisfying perturbationity conditions.
We also illustrate the performance of our algorithm on real datasets.
arXiv Detail & Related papers (2020-06-08T18:52:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.