Distributional Reinforcement Learning via Moment Matching
- URL: http://arxiv.org/abs/2007.12354v3
- Date: Wed, 9 Dec 2020 00:38:36 GMT
- Title: Distributional Reinforcement Learning via Moment Matching
- Authors: Thanh Tang Nguyen, Sunil Gupta, Svetha Venkatesh
- Abstract summary: We formulate a method that learns a finite set of statistics from each return distribution via neural networks.
Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target.
Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
- Score: 54.16108052278444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of learning a set of probability distributions from
the empirical Bellman dynamics in distributional reinforcement learning (RL), a
class of state-of-the-art methods that estimate the distribution, as opposed to
only the expectation, of the total return. We formulate a method that learns a
finite set of statistics from each return distribution via neural networks, as
in (Bellemare, Dabney, and Munos 2017; Dabney et al. 2018b). Existing
distributional RL methods however constrain the learned statistics to
\emph{predefined} functional forms of the return distribution which is both
restrictive in representation and difficult in maintaining the predefined
statistics. Instead, we learn \emph{unrestricted} statistics, i.e.,
deterministic (pseudo-)samples, of the return distribution by leveraging a
technique from hypothesis testing known as maximum mean discrepancy (MMD),
which leads to a simpler objective amenable to backpropagation. Our method can
be interpreted as implicitly matching all orders of moments between a return
distribution and its Bellman target. We establish sufficient conditions for the
contraction of the distributional Bellman operator and provide finite-sample
analysis for the deterministic samples in distribution approximation.
Experiments on the suite of Atari games show that our method outperforms the
standard distributional RL baselines and sets a new record in the Atari games
for non-distributed agents.
Related papers
- Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional general score-mismatched diffusion samplers.
We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.
This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z) - DistPred: A Distribution-Free Probabilistic Inference Method for Regression and Forecasting [14.390842560217743]
We propose a novel approach called DistPred for regression and forecasting tasks.
We transform proper scoring rules that measure the discrepancy between the predicted distribution and the target distribution into a differentiable discrete form.
This allows the model to sample numerous samples in a single forward pass to estimate the potential distribution of the response variable.
arXiv Detail & Related papers (2024-06-17T10:33:00Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - A Distributional Analogue to the Successor Representation [54.99439648059807]
This paper contributes a new approach for distributional reinforcement learning.
It elucidates a clean separation of transition structure and reward in the learning process.
As an illustration, we show that it enables zero-shot risk-sensitive policy evaluation.
arXiv Detail & Related papers (2024-02-13T15:35:24Z) - Distributional Off-policy Evaluation with Bellman Residual Minimization [12.343981093497332]
We study distributional off-policy evaluation (OPE)
The goal is to learn the distribution of the return for a target policy using offline data generated by a different policy.
We propose a new method called Energy Bellman Residual Minimizer (EBRM)
arXiv Detail & Related papers (2024-02-02T20:59:29Z) - Dr. FERMI: A Stochastic Distributionally Robust Fair Empirical Risk
Minimization Framework [12.734559823650887]
In the presence of distribution shifts, fair machine learning models may behave unfairly on test data.
Existing algorithms require full access to data and cannot be used when small batches are used.
This paper proposes the first distributionally robust fairness framework with convergence guarantees that do not require knowledge of the causal graph.
arXiv Detail & Related papers (2023-09-20T23:25:28Z) - Distribution Shift Inversion for Out-of-Distribution Prediction [57.22301285120695]
We propose a portable Distribution Shift Inversion algorithm for Out-of-Distribution (OoD) prediction.
We show that our method provides a general performance gain when plugged into a wide range of commonly used OoD algorithms.
arXiv Detail & Related papers (2023-06-14T08:00:49Z) - Policy Evaluation in Distributional LQR [70.63903506291383]
We provide a closed-form expression of the distribution of the random return.
We show that this distribution can be approximated by a finite number of random variables.
Using the approximate return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR.
arXiv Detail & Related papers (2023-03-23T20:27:40Z) - Distributional Reinforcement Learning with Unconstrained Monotonic
Neural Networks [7.907645828535088]
The paper introduces a methodology for learning different representations of the random return distribution.
A novel distributional RL algorithm named unconstrained monotonic deep Q-network (UMDQN) is presented.
arXiv Detail & Related papers (2021-06-06T20:03:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.