Related papers: Distributional Reinforcement Learning with Dual Expectile-Quantile Regression

Distributional Reinforcement Learning with Dual Expectile-Quantile Regression

URL: http://arxiv.org/abs/2305.16877v2
Date: Mon, 18 Mar 2024 14:27:21 GMT
Title: Distributional Reinforcement Learning with Dual Expectile-Quantile Regression
Authors: Sami Jullien, Romain Deffayet, Jean-Michel Renders, Paul Groth, Maarten de Rijke,
Abstract summary: quantile regression approach to distributional RL provides flexible and effective way of learning arbitrary return distributions. We show that distributional guarantees vanish, and we empirically observe that the estimated distribution rapidly collapses to its mean estimation. Motivated by the efficiency of $L$-based learning, we propose to jointly learn expectiles and quantiles of the return distribution in a way that allows efficient learning while keeping an estimate of the full distribution of returns.
Score: 51.87411935256015
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Distributional reinforcement learning (RL) has proven useful in multiple benchmarks as it enables approximating the full distribution of returns and makes a better use of environment samples. The commonly used quantile regression approach to distributional RL -- based on asymmetric $L_1$ losses -- provides a flexible and effective way of learning arbitrary return distributions. In practice, it is often improved by using a more efficient, hybrid asymmetric $L_1$-$L_2$ Huber loss for quantile regression. However, by doing so, distributional estimation guarantees vanish, and we empirically observe that the estimated distribution rapidly collapses to its mean. Indeed, asymmetric $L_2$ losses, corresponding to expectile regression, cannot be readily used for distributional temporal difference learning. Motivated by the efficiency of $L_2$-based learning, we propose to jointly learn expectiles and quantiles of the return distribution in a way that allows efficient learning while keeping an estimate of the full distribution of returns. We prove that our approach approximately learns the correct return distribution, and we benchmark a practical implementation on a toy example and at scale. On the Atari benchmark, our approach matches the performance of the Huber-based IQN-1 baseline after $200$M training frames but avoids distributional collapse and keeps estimates of the full distribution of returns.

Related papers

Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning [50.93804891554481]
We introduce a novel estimator based on the log-sum-exponential (LSE) operator, which outperforms traditional inverse propensity score estimators.<n>Our LSE estimator demonstrates variance reduction and robustness under heavy-tailed conditions.<n>In the off-policy learning scenario, we establish bounds on the regret -- the performance gap between our LSE estimator and the optimal policy.
arXiv Detail & Related papers (2025-06-07T17:37:10Z)
Regression Discontinuity Design with Distribution-Valued Outcomes [0.0]
This article introduces Regression Discontinuity Design (RDD) with Distribution-Valued Outcomes (R3D)<n>It extends the standard RDD framework to settings where the outcome is a distribution rather than a scalar.<n>I then apply the proposed methods to study the effects of gubernatorial party control on within-state income distributions in the US.
arXiv Detail & Related papers (2025-04-04T23:12:35Z)
Statistical-Computational Trade-offs for Recursive Adaptive Partitioning Estimators [23.056208049082134]
We show that greedy algorithms for high-dimensional regression get stuck at local optima. We show that greedy training requires $exp(Omega(d))$ to achieve low estimation error. This dichotomy mirrors that of two-layer neural networks trained with gradient descent (SGD) in the mean-field regime.
arXiv Detail & Related papers (2024-11-07T03:11:53Z)
Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs.<n>We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint.<n>We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z)
Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions. We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance. Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z)
Stochastic Q-learning for Large Discrete Action Spaces [79.1700188160944]
In complex environments with discrete action spaces, effective decision-making is critical in reinforcement learning (RL) We present value-based RL approaches which, as opposed to optimizing over the entire set of $n$ actions, only consider a variable set of actions, possibly as small as $mathcalO(log(n)$)$. The presented value-based RL methods include, among others, Q-learning, StochDQN, StochDDQN, all of which integrate this approach for both value-function updates and action selection.
arXiv Detail & Related papers (2024-05-16T17:58:44Z)
Uncertainty Voting Ensemble for Imbalanced Deep Regression [20.176217123752465]
In this paper, we introduce UVOTE, a method for learning from imbalanced data. We replace traditional regression losses with negative log-likelihood, which also predicts sample-wise aleatoric uncertainty. We show that UVOTE consistently outperforms the prior art, while at the same time producing better-calibrated uncertainty estimates.
arXiv Detail & Related papers (2023-05-24T14:12:21Z)
SIMPLE: A Gradient Estimator for $k$-Subset Sampling [42.38652558807518]
In this work, we fall back to discrete $k$-subset sampling on the forward pass. We show that our gradient estimator, SIMPLE, exhibits lower bias and variance compared to state-of-the-art estimators. Empirical results show improved performance on learning to explain and sparse linear regression.
arXiv Detail & Related papers (2022-10-04T22:33:16Z)
Learnable Distribution Calibration for Few-Shot Class-Incremental Learning [122.2241120474278]
Few-shot class-incremental learning (FSCIL) faces challenges of memorizing old class distributions and estimating new class distributions given few training samples. We propose a learnable distribution calibration (LDC) approach, with the aim to systematically solve these two challenges using a unified framework.
arXiv Detail & Related papers (2022-10-01T09:40:26Z)
How Does Return Distribution in Distributional Reinforcement Learning Help Optimization? [10.149055921090572]
We investigate the optimization advantages of distributional RL within the Neural Fitted Z-Iteration(Neural FZI) framework. We show that distributional RL has desirable smoothness characteristics and hence enjoys stable gradients. Our research findings illuminate how the return distribution in distributional RL algorithms helps the optimization.
arXiv Detail & Related papers (2022-09-29T02:18:31Z)
Normality-Guided Distributional Reinforcement Learning for Continuous Control [16.324313304691426]
Learning a predictive model of the mean return, or value function, plays a critical role in many reinforcement learning algorithms. We study the value distribution in several continuous control tasks and find that the learned value distribution is empirical quite close to normal. We propose a policy update strategy based on the correctness as measured by structural characteristics of the value distribution not present in the standard value function.
arXiv Detail & Related papers (2022-08-28T02:52:10Z)
Distributional Reinforcement Learning for Multi-Dimensional Reward Functions [91.88969237680669]
We introduce Multi-Dimensional Distributional DQN (MD3QN) to model the joint return distribution from multiple reward sources. As a by-product of joint distribution modeling, MD3QN can capture the randomness in returns for each source of reward. In experiments, our method accurately models the joint return distribution in environments with richly correlated reward functions.
arXiv Detail & Related papers (2021-10-26T11:24:23Z)
Bayesian Distributional Policy Gradients [2.28438857884398]
Distributional Reinforcement Learning maintains the entire probability distribution of the reward-to-go, i.e. the return. Bayesian Distributional Policy Gradients (BDPG) uses adversarial training in joint-contrastive learning to estimate a variational posterior from the returns.
arXiv Detail & Related papers (2021-03-20T23:42:50Z)
Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.