Revisiting Energy Based Models as Policies: Ranking Noise Contrastive
Estimation and Interpolating Energy Models
- URL: http://arxiv.org/abs/2309.05803v1
- Date: Mon, 11 Sep 2023 20:13:47 GMT
- Title: Revisiting Energy Based Models as Policies: Ranking Noise Contrastive
Estimation and Interpolating Energy Models
- Authors: Sumeet Singh, Stephen Tu, Vikas Sindhwani
- Abstract summary: In this work, we revisit the choice of energy-based models (EBM) as a policy class.
We develop a training objective and algorithm for energy models which combines several key ingredients.
We show that the Implicit Behavior Cloning (IBC) objective is actually biased even at the population level.
- Score: 18.949193683555237
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A crucial design decision for any robot learning pipeline is the choice of
policy representation: what type of model should be used to generate the next
set of robot actions? Owing to the inherent multi-modal nature of many robotic
tasks, combined with the recent successes in generative modeling, researchers
have turned to state-of-the-art probabilistic models such as diffusion models
for policy representation. In this work, we revisit the choice of energy-based
models (EBM) as a policy class. We show that the prevailing folklore -- that
energy models in high dimensional continuous spaces are impractical to train --
is false. We develop a practical training objective and algorithm for energy
models which combines several key ingredients: (i) ranking noise contrastive
estimation (R-NCE), (ii) learnable negative samplers, and (iii) non-adversarial
joint training. We prove that our proposed objective function is asymptotically
consistent and quantify its limiting variance. On the other hand, we show that
the Implicit Behavior Cloning (IBC) objective is actually biased even at the
population level, providing a mathematical explanation for the poor performance
of IBC trained energy policies in several independent follow-up works. We
further extend our algorithm to learn a continuous stochastic process that
bridges noise and data, modeling this process with a family of EBMs indexed by
scale variable. In doing so, we demonstrate that the core idea behind recent
progress in generative modeling is actually compatible with EBMs. Altogether,
our proposed training algorithms enable us to train energy-based models as
policies which compete with -- and even outperform -- diffusion models and
other state-of-the-art approaches in several challenging multi-modal
benchmarks: obstacle avoidance path planning and contact-rich block pushing.
Related papers
- Energy-Based Diffusion Language Models for Text Generation [126.23425882687195]
Energy-based Diffusion Language Model (EDLM) is an energy-based model operating at the full sequence level for each diffusion step.
Our framework offers a 1.3$times$ sampling speedup over existing diffusion models.
arXiv Detail & Related papers (2024-10-28T17:25:56Z) - Exploring Model Transferability through the Lens of Potential Energy [78.60851825944212]
Transfer learning has become crucial in computer vision tasks due to the vast availability of pre-trained deep learning models.
Existing methods for measuring the transferability of pre-trained models rely on statistical correlations between encoded static features and task labels.
We present an insightful physics-inspired approach named PED to address these challenges.
arXiv Detail & Related papers (2023-08-29T07:15:57Z) - Maximum entropy exploration in contextual bandits with neural networks
and energy based models [63.872634680339644]
We present two classes of models, one with neural networks as reward estimators, and the other with energy based models.
We show that both techniques outperform well-known standard algorithms, where energy based models have the best overall performance.
This provides practitioners with new techniques that perform well in static and dynamic settings, and are particularly well suited to non-linear scenarios with continuous action spaces.
arXiv Detail & Related papers (2022-10-12T15:09:45Z) - Your Autoregressive Generative Model Can be Better If You Treat It as an
Energy-Based One [83.5162421521224]
We propose a unique method termed E-ARM for training autoregressive generative models.
E-ARM takes advantage of a well-designed energy-based learning objective.
We show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem.
arXiv Detail & Related papers (2022-06-26T10:58:41Z) - Model-Based Imitation Learning Using Entropy Regularization of Model and
Policy [0.456877715768796]
We propose model-based Entropy-Regularized Imitation Learning (MB-ERIL) under the entropy-regularized Markov decision process.
A policy discriminator distinguishes the actions generated by a robot from expert ones, and a model discriminator distinguishes the counterfactual state transitions generated by the model from the actual ones.
Computer simulations and real robot experiments show that MB-ERIL achieves a competitive performance and significantly improves the sample efficiency compared to baseline methods.
arXiv Detail & Related papers (2022-06-21T04:15:12Z) - Evaluating model-based planning and planner amortization for continuous
control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning.
We find that well-tuned model-free agents are strong baselines even for high DoF control problems.
We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z) - Sample Efficient Reinforcement Learning via Model-Ensemble Exploration
and Exploitation [3.728946517493471]
MEEE is a model-ensemble method that consists of optimistic exploration and weighted exploitation.
Our approach outperforms other model-free and model-based state-of-the-art methods, especially in sample complexity.
arXiv Detail & Related papers (2021-07-05T07:18:20Z) - Model Predictive Actor-Critic: Accelerating Robot Skill Acquisition with
Deep Reinforcement Learning [42.525696463089794]
Model Predictive Actor-Critic (MoPAC) is a hybrid model-based/model-free method that combines model predictive rollouts with policy optimization as to mitigate model bias.
MoPAC guarantees optimal skill learning up to an approximation error and reduces necessary physical interaction with the environment.
arXiv Detail & Related papers (2021-03-25T13:50:24Z) - A Spectral Energy Distance for Parallel Speech Synthesis [29.14723501889278]
Speech synthesis is an important practical generative modeling problem.
We propose a new learning method that allows us to train highly parallel models of speech.
arXiv Detail & Related papers (2020-08-03T19:56:04Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.