GMAC: A Distributional Perspective on Actor-Critic Framework
- URL: http://arxiv.org/abs/2105.11366v1
- Date: Mon, 24 May 2021 15:50:26 GMT
- Title: GMAC: A Distributional Perspective on Actor-Critic Framework
- Authors: Daniel Wontae Nam, Younghoon Kim, Chan Y. Park
- Abstract summary: We propose a new method that minimizes the Cram'er distance with the multi-step Bellman target distribution generated from a novel Sample-Replacement algorithm SR($lambda$)
We empirically show that GMAC captures the correct representation of value distributions and improves the performance of a conventional actor-critic method with low computational cost.
- Score: 6.243642831536256
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we devise a distributional framework on actor-critic as a
solution to distributional instability, action type restriction, and conflation
between samples and statistics. We propose a new method that minimizes the
Cram\'er distance with the multi-step Bellman target distribution generated
from a novel Sample-Replacement algorithm denoted SR($\lambda$), which learns
the correct value distribution under multiple Bellman operations.
Parameterizing a value distribution with Gaussian Mixture Model further
improves the efficiency and the performance of the method, which we name GMAC.
We empirically show that GMAC captures the correct representation of value
distributions and improves the performance of a conventional actor-critic
method with low computational cost, in both discrete and continuous action
spaces using Arcade Learning Environment (ALE) and PyBullet environment.
Related papers
- Symmetric Q-learning: Reducing Skewness of Bellman Error in Online
Reinforcement Learning [55.75959755058356]
In deep reinforcement learning, estimating the value function is essential to evaluate the quality of states and actions.
A recent study suggested that the error distribution for training the value function is often skewed because of the properties of the Bellman operator.
We proposed a method called Symmetric Q-learning, in which the synthetic noise generated from a zero-mean distribution is added to the target values to generate a Gaussian error distribution.
arXiv Detail & Related papers (2024-03-12T14:49:19Z) - Delta-AI: Local objectives for amortized inference in sparse graphical models [64.5938437823851]
We present a new algorithm for amortized inference in sparse probabilistic graphical models (PGMs)
Our approach is based on the observation that when the sampling of variables in a PGM is seen as a sequence of actions taken by an agent, sparsity of the PGM enables local credit assignment in the agent's policy learning objective.
We illustrate $Delta$-AI's effectiveness for sampling from synthetic PGMs and training latent variable models with sparse factor structure.
arXiv Detail & Related papers (2023-10-03T20:37:03Z) - Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks.
We study the problem from a model-based Bayesian reinforcement learning perspective.
We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z) - Learning Distributions via Monte-Carlo Marginalization [9.131712404284876]
We propose a novel method to learn intractable distributions from their samples.
The Monte-Carlo Marginalization (MCMarg) is proposed to address this issue.
The proposed approach is a powerful tool to learn complex distributions and the entire process is differentiable.
arXiv Detail & Related papers (2023-08-11T19:08:06Z) - Learning Sampling Distributions for Model Predictive Control [36.82905770866734]
Sampling-based approaches to Model Predictive Control (MPC) have become a cornerstone of contemporary approaches to MPC.
We propose to carry out all operations in the latent space, allowing us to take full advantage of the learned distribution.
Specifically, we frame the learning problem as bi-level optimization and show how to train the controller with backpropagation-through-time.
arXiv Detail & Related papers (2022-12-05T20:35:36Z) - Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models.
We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling.
We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z) - Conjugated Discrete Distributions for Distributional Reinforcement
Learning [0.0]
We show that one of the most successful methods may not yield an optimal policy if we have a non-deterministic process.
We argue that distributional reinforcement learning lends itself to remedy this situation completely.
arXiv Detail & Related papers (2021-12-14T14:14:49Z) - LSB: Local Self-Balancing MCMC in Discrete Spaces [2.385916960125935]
This work considers using machine learning to adapt the proposal distribution to the target, in order to improve the sampling efficiency in the purely discrete domain.
We call the resulting sampler as the Locally Self-Balancing Sampler (LSB)
arXiv Detail & Related papers (2021-09-08T18:31:26Z) - KL Guided Domain Adaptation [88.19298405363452]
Domain adaptation is an important problem and often needed for real-world applications.
A common approach in the domain adaptation literature is to learn a representation of the input that has the same distributions over the source and the target domain.
We show that with a probabilistic representation network, the KL term can be estimated efficiently via minibatch samples.
arXiv Detail & Related papers (2021-06-14T22:24:23Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.