Variational Intrinsic Control Revisited
- URL: http://arxiv.org/abs/2010.03281v2
- Date: Wed, 17 Mar 2021 14:49:17 GMT
- Title: Variational Intrinsic Control Revisited
- Authors: Taehwan Kwon
- Abstract summary: In the original work by Gregor et al., two VIC algorithms were proposed: one that represents the options explicitly, and the other that does it implicitly.
We show that the intrinsic reward used in the latter is subject to bias in environments, causing convergence to suboptimal solutions.
We propose two methods to correct this behavior and achieve the maximal empowerment.
- Score: 7.6146285961466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we revisit variational intrinsic control (VIC), an
unsupervised reinforcement learning method for finding the largest set of
intrinsic options available to an agent. In the original work by Gregor et al.
(2016), two VIC algorithms were proposed: one that represents the options
explicitly, and the other that does it implicitly. We show that the intrinsic
reward used in the latter is subject to bias in stochastic environments,
causing convergence to suboptimal solutions. To correct this behavior and
achieve the maximal empowerment, we propose two methods respectively based on
the transitional probability model and Gaussian mixture model. We substantiate
our claims through rigorous mathematical derivations and experimental analyses.
Related papers
- HJ-sampler: A Bayesian sampler for inverse problems of a stochastic process by leveraging Hamilton-Jacobi PDEs and score-based generative models [1.949927790632678]
This paper builds on the log transform known as the Cole-Hopf transform in Brownian motion contexts.
We develop a new algorithm, named the HJ-sampler, for inference for the inverse problem of a differential equation with given terminal observations.
arXiv Detail & Related papers (2024-09-15T05:30:54Z) - Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both
Worlds in Stochastic and Deterministic Environments [48.96971760679639]
We study variance-dependent regret bounds for Markov decision processes (MDPs)
We propose two new environment norms to characterize the fine-grained variance properties of the environment.
For model-based methods, we design a variant of the MVP algorithm.
In particular, this bound is simultaneously minimax optimal for both and deterministic MDPs.
arXiv Detail & Related papers (2023-01-31T06:54:06Z) - Comparing two samples through stochastic dominance: a graphical approach [2.867517731896504]
Non-deterministic measurements are common in real-world scenarios.
We propose an alternative framework to visually compare two samples according to their estimated cumulative distribution functions.
arXiv Detail & Related papers (2022-03-15T13:37:03Z) - Accelerating Stochastic Probabilistic Inference [1.599072005190786]
Variational Inference (SVI) has been increasingly attractive thanks to its ability to find good posterior approximations of probabilistic models.
Almost all the state-of-the-art SVI algorithms are based on first-order optimization and often suffer from poor convergence rate.
We bridge the gap between second-order methods and variational inference by proposing a second-order based variational inference approach.
arXiv Detail & Related papers (2022-03-15T01:19:12Z) - Loss function based second-order Jensen inequality and its application
to particle variational inference [112.58907653042317]
Particle variational inference (PVI) uses an ensemble of models as an empirical approximation for the posterior distribution.
PVI iteratively updates each model with a repulsion force to ensure the diversity of the optimized models.
We derive a novel generalization error bound and show that it can be reduced by enhancing the diversity of models.
arXiv Detail & Related papers (2021-06-09T12:13:51Z) - GroupifyVAE: from Group-based Definition to VAE-based Unsupervised
Representation Disentanglement [91.9003001845855]
VAE-based unsupervised disentanglement can not be achieved without introducing other inductive bias.
We address VAE-based unsupervised disentanglement by leveraging the constraints derived from the Group Theory based definition as the non-probabilistic inductive bias.
We train 1800 models covering the most prominent VAE-based models on five datasets to verify the effectiveness of our method.
arXiv Detail & Related papers (2021-02-20T09:49:51Z) - Simple and optimal methods for stochastic variational inequalities, I:
operator extrapolation [9.359939442911127]
We first present a novel operator extrapolation (OE) method for solving deterministic variational inequality (VI) problems.
We then introduce the operator extrapolation (SOE) method and establish its optimal convergence behavior for solving different inequality VI problems.
arXiv Detail & Related papers (2020-11-05T17:20:19Z) - The Risks of Invariant Risk Minimization [52.7137956951533]
Invariant Risk Minimization is an objective based on the idea for learning deep, invariant features of data.
We present the first analysis of classification under the IRM objective--as well as these recently proposed alternatives--under a fairly natural and general model.
We show that IRM can fail catastrophically unless the test data are sufficiently similar to the training distribution--this is precisely the issue that it was intended to solve.
arXiv Detail & Related papers (2020-10-12T14:54:32Z) - Robust, Accurate Stochastic Optimization for Variational Inference [68.83746081733464]
We show that common optimization methods lead to poor variational approximations if the problem is moderately large.
Motivated by these findings, we develop a more robust and accurate optimization framework by viewing the underlying algorithm as producing a Markov chain.
arXiv Detail & Related papers (2020-09-01T19:12:11Z) - A One-step Approach to Covariate Shift Adaptation [82.01909503235385]
A default assumption in many machine learning scenarios is that the training and test samples are drawn from the same probability distribution.
We propose a novel one-step approach that jointly learns the predictive model and the associated weights in one optimization.
arXiv Detail & Related papers (2020-07-08T11:35:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.