Value function estimation using conditional diffusion models for control
- URL: http://arxiv.org/abs/2306.07290v1
- Date: Fri, 9 Jun 2023 18:40:55 GMT
- Title: Value function estimation using conditional diffusion models for control
- Authors: Bogdan Mazoure, Walter Talbott, Miguel Angel Bautista, Devon Hjelm,
Alexander Toshev, Josh Susskind
- Abstract summary: We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
- Score: 62.27184818047923
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A fairly reliable trend in deep reinforcement learning is that the
performance scales with the number of parameters, provided a complimentary
scaling in amount of training data. As the appetite for large models increases,
it is imperative to address, sooner than later, the potential problem of
running out of high-quality demonstrations. In this case, instead of collecting
only new data via costly human demonstrations or risking a simulation-to-real
transfer with uncertain effects, it would be beneficial to leverage vast
amounts of readily-available low-quality data. Since classical control
algorithms such as behavior cloning or temporal difference learning cannot be
used on reward-free or action-free data out-of-the-box, this solution warrants
novel training paradigms for continuous control. We propose a simple algorithm
called Diffused Value Function (DVF), which learns a joint multi-step model of
the environment-robot interaction dynamics using a diffusion model. This model
can be efficiently learned from state sequences (i.e., without access to reward
functions nor actions), and subsequently used to estimate the value of each
action out-of-the-box. We show how DVF can be used to efficiently capture the
state visitation measure for multiple controllers, and show promising
qualitative and quantitative results on challenging robotics benchmarks.
Related papers
- Simulation-Free Training of Neural ODEs on Paired Data [20.36333430055869]
We employ the flow matching framework for simulation-free training of NODEs.
We show that applying flow matching directly between paired data can often lead to an ill-defined flow.
We propose a simple extension that applies flow matching in the embedding space of data pairs.
arXiv Detail & Related papers (2024-10-30T11:18:27Z) - Diffusion-Generative Multi-Fidelity Learning for Physical Simulation [24.723536390322582]
We develop a diffusion-generative multi-fidelity learning method based on differential equations (SDE), where the generation is a continuous denoising process.
By conditioning on additional inputs (temporal or spacial variables), our model can efficiently learn and predict multi-dimensional solution arrays.
arXiv Detail & Related papers (2023-11-09T18:59:05Z) - Diffusion-Model-Assisted Supervised Learning of Generative Models for
Density Estimation [10.793646707711442]
We present a framework for training generative models for density estimation.
We use the score-based diffusion model to generate labeled data.
Once the labeled data are generated, we can train a simple fully connected neural network to learn the generative model in the supervised manner.
arXiv Detail & Related papers (2023-10-22T23:56:19Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - Model Predictive Control with Self-supervised Representation Learning [13.225264876433528]
We propose the use of a reconstruction function within the TD-MPC framework, so that the agent can reconstruct the original observation.
Our proposed addition of another loss term leads to improved performance on both state- and image-based tasks.
arXiv Detail & Related papers (2023-04-14T16:02:04Z) - Value-Consistent Representation Learning for Data-Efficient
Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making.
Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values.
It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z) - Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective.
Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination.
Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Automatic Recall Machines: Internal Replay, Continual Learning and the
Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective.
Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.