Implicit Distributional Reinforcement Learning
- URL: http://arxiv.org/abs/2007.06159v2
- Date: Mon, 19 Oct 2020 20:23:32 GMT
- Title: Implicit Distributional Reinforcement Learning
- Authors: Yuguang Yue, Zhendong Wang, Mingyuan Zhou
- Abstract summary: implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
- Score: 61.166030238490634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To improve the sample efficiency of policy-gradient based reinforcement
learning algorithms, we propose implicit distributional actor-critic (IDAC)
that consists of a distributional critic, built on two deep generator networks
(DGNs), and a semi-implicit actor (SIA), powered by a flexible policy
distribution. We adopt a distributional perspective on the discounted
cumulative return and model it with a state-action-dependent implicit
distribution, which is approximated by the DGNs that take state-action pairs
and random noises as their input. Moreover, we use the SIA to provide a
semi-implicit policy distribution, which mixes the policy parameters with a
reparameterizable distribution that is not constrained by an analytic density
function. In this way, the policy's marginal distribution is implicit,
providing the potential to model complex properties such as covariance
structure and skewness, but its parameter and entropy can still be estimated.
We incorporate these features with an off-policy algorithm framework to solve
problems with continuous action space and compare IDAC with state-of-the-art
algorithms on representative OpenAI Gym environments. We observe that IDAC
outperforms these baselines in most tasks. Python code is provided.
Related papers
- Diffusion Actor-Critic with Entropy Regulator [32.79341490514616]
We propose an online RL algorithm termed diffusion actor-critic with entropy regulator (DACER)
This algorithm conceptualizes the reverse process of the diffusion model as a novel policy function.
Experiments on MuJoCo benchmarks and a multimodal task demonstrate that the DACER algorithm achieves state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2024-05-24T03:23:27Z) - A Distributional Analogue to the Successor Representation [54.99439648059807]
This paper contributes a new approach for distributional reinforcement learning.
It elucidates a clean separation of transition structure and reward in the learning process.
As an illustration, we show that it enables zero-shot risk-sensitive policy evaluation.
arXiv Detail & Related papers (2024-02-13T15:35:24Z) - Score-Aware Policy-Gradient Methods and Performance Guarantees using Local Lyapunov Conditions: Applications to Product-Form Stochastic Networks and Queueing Systems [1.747623282473278]
We introduce a policygradient method for model reinforcement learning (RL) that exploits a type of stationary distributions commonly obtained from decision processes (MDPs) in networks.
Specifically, when the stationary distribution of the MDP is parametrized by policy parameters, we can improve existing policy methods for average-reward estimation.
arXiv Detail & Related papers (2023-12-05T14:44:58Z) - Delta-AI: Local objectives for amortized inference in sparse graphical models [64.5938437823851]
We present a new algorithm for amortized inference in sparse probabilistic graphical models (PGMs)
Our approach is based on the observation that when the sampling of variables in a PGM is seen as a sequence of actions taken by an agent, sparsity of the PGM enables local credit assignment in the agent's policy learning objective.
We illustrate $Delta$-AI's effectiveness for sampling from synthetic PGMs and training latent variable models with sparse factor structure.
arXiv Detail & Related papers (2023-10-03T20:37:03Z) - Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks.
We study the problem from a model-based Bayesian reinforcement learning perspective.
We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z) - Strategic Distribution Shift of Interacting Agents via Coupled Gradient
Flows [6.064702468344376]
We propose a novel framework for analyzing the dynamics of distribution shift in real-world systems.
We show that our approach captures well-documented forms of distribution shifts like polarization and disparate impacts that simpler models cannot capture.
arXiv Detail & Related papers (2023-07-03T17:18:50Z) - PACER: A Fully Push-forward-based Distributional Reinforcement Learning Algorithm [28.48626438603237]
PACER consists of a distributional critic, an actor and a sample-based encourager.
Push-forward operator is leveraged in both the critic and actor to model the return distributions and policies respectively.
A sample-based utility value policy gradient is established for the push-forward policy update.
arXiv Detail & Related papers (2023-06-11T09:45:31Z) - Compressed Regression over Adaptive Networks [58.79251288443156]
We derive the performance achievable by a network of distributed agents that solve, adaptively and in the presence of communication constraints, a regression problem.
We devise an optimized allocation strategy where the parameters necessary for the optimization can be learned online by the agents.
arXiv Detail & Related papers (2023-04-07T13:41:08Z) - Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time
Reinforcement Learning [39.07307690074323]
We consider the problem of predicting the distribution of returns obtained by an agent interacting in a continuous-time environment.
Accurate return predictions have proven useful for determining optimal policies for risk-sensitive control, state representations, multiagent coordination, and more.
We propose a tractable algorithm for approximately solving the distributional HJB based on a JKO scheme, which can be implemented in an online control algorithm.
arXiv Detail & Related papers (2022-05-24T16:33:54Z) - A Distributional Analysis of Sampling-Based Reinforcement Learning
Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes.
We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.