Efficient Empowerment Estimation for Unsupervised Stabilization
- URL: http://arxiv.org/abs/2007.07356v2
- Date: Sun, 9 May 2021 06:16:25 GMT
- Title: Efficient Empowerment Estimation for Unsupervised Stabilization
- Authors: Ruihan Zhao, Kevin Lu, Pieter Abbeel, Stas Tiomkin
- Abstract summary: empowerment principle enables unsupervised stabilization of dynamical systems at upright positions.
We propose an alternative solution based on a trainable representation of a dynamical system as a Gaussian channel.
We show that our method has a lower sample complexity, is more stable in training, possesses the essential properties of the empowerment function, and allows estimation of empowerment from images.
- Score: 75.32013242448151
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intrinsically motivated artificial agents learn advantageous behavior without
externally-provided rewards. Previously, it was shown that maximizing mutual
information between agent actuators and future states, known as the empowerment
principle, enables unsupervised stabilization of dynamical systems at upright
positions, which is a prototypical intrinsically motivated behavior for upright
standing and walking. This follows from the coincidence between the objective
of stabilization and the objective of empowerment. Unfortunately, sample-based
estimation of this kind of mutual information is challenging. Recently, various
variational lower bounds (VLBs) on empowerment have been proposed as solutions;
however, they are often biased, unstable in training, and have high sample
complexity. In this work, we propose an alternative solution based on a
trainable representation of a dynamical system as a Gaussian channel, which
allows us to efficiently calculate an unbiased estimator of empowerment by
convex optimization. We demonstrate our solution for sample-based unsupervised
stabilization on different dynamical control systems and show the advantages of
our method by comparing it to the existing VLB approaches. Specifically, we
show that our method has a lower sample complexity, is more stable in training,
possesses the essential properties of the empowerment function, and allows
estimation of empowerment from images. Consequently, our method opens a path to
wider and easier adoption of empowerment for various applications.
Related papers
- Actively Learning Reinforcement Learning: A Stochastic Optimal Control Approach [3.453622106101339]
We propose a framework towards achieving two intertwined objectives: (i) equipping reinforcement learning with active exploration and deliberate information gathering, and (ii) overcoming the computational intractability of optimal control law.
We approach both objectives by using reinforcement learning to compute the optimal control law.
Unlike fixed exploration and exploitation balance, caution and probing are employed automatically by the controller in real-time, even after the learning process is terminated.
arXiv Detail & Related papers (2023-09-18T18:05:35Z) - Distributionally Robust Model-based Reinforcement Learning with Large
State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment.
We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets.
We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - Mimicking Better by Matching the Approximate Action Distribution [48.95048003354255]
We introduce MAAD, a novel, sample-efficient on-policy algorithm for Imitation Learning from Observations.
We show that it requires considerable fewer interactions to achieve expert performance, outperforming current state-of-the-art on-policy methods.
arXiv Detail & Related papers (2023-06-16T12:43:47Z) - Dynamic Memory for Interpretable Sequential Optimisation [0.0]
We present a solution to handling non-stationarity that is suitable for deployment at scale.
We develop an adaptive Bayesian learning agent that employs a novel form of dynamic memory.
We describe the architecture of a large-scale deployment of automatic-as-a-service.
arXiv Detail & Related papers (2022-06-28T12:29:13Z) - Imitation Learning by State-Only Distribution Matching [2.580765958706854]
Imitation Learning from observation describes policy learning in a similar way to human learning.
We propose a non-adversarial learning-from-observations approach, together with an interpretable convergence and performance metric.
arXiv Detail & Related papers (2022-02-09T08:38:50Z) - Training Generative Adversarial Networks by Solving Ordinary
Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training.
From this perspective, we hypothesise that instabilities in training GANs arise from the integration error.
We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.