Contrastive Example-Based Control
- URL: http://arxiv.org/abs/2307.13101v1
- Date: Mon, 24 Jul 2023 19:43:22 GMT
- Title: Contrastive Example-Based Control
- Authors: Kyle Hatch, Benjamin Eysenbach, Rafael Rafailov, Tianhe Yu, Ruslan
Salakhutdinov, Sergey Levine, Chelsea Finn
- Abstract summary: We propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function.
Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions.
- Score: 163.6482792040079
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While many real-world problems that might benefit from reinforcement
learning, these problems rarely fit into the MDP mold: interacting with the
environment is often expensive and specifying reward functions is challenging.
Motivated by these challenges, prior work has developed data-driven approaches
that learn entirely from samples from the transition dynamics and examples of
high-return states. These methods typically learn a reward function from
high-return states, use that reward function to label the transitions, and then
apply an offline RL algorithm to these transitions. While these methods can
achieve good results on many tasks, they can be complex, often requiring
regularization and temporal difference updates. In this paper, we propose a
method for offline, example-based control that learns an implicit model of
multi-step transitions, rather than a reward function. We show that this
implicit model can represent the Q-values for the example-based control
problem. Across a range of state-based and image-based offline control tasks,
our method outperforms baselines that use learned reward functions; additional
experiments demonstrate improved robustness and scaling with dataset size.
Related papers
- Offline Reinforcement Learning from Datasets with Structured Non-Stationarity [50.35634234137108]
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy.
We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode.
We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
arXiv Detail & Related papers (2024-05-23T02:41:36Z) - Model-Based Reinforcement Learning Control of Reaction-Diffusion
Problems [0.0]
reinforcement learning has been applied to decision-making in several applications, most notably in games.
We introduce two novel reward functions to drive the flow of the transported field.
Results show that certain controls can be implemented successfully in these applications.
arXiv Detail & Related papers (2024-02-22T11:06:07Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - Self-Supervised Reinforcement Learning that Transfers using Random
Features [41.00256493388967]
We propose a self-supervised reinforcement learning method that enables the transfer of behaviors across tasks with different rewards.
Our method is self-supervised in that it can be trained on offline datasets without reward labels, but can then be quickly deployed on new tasks.
arXiv Detail & Related papers (2023-05-26T20:37:06Z) - Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through Memory Sharing of Q-Snapshots [11.533449955841968]
We propose Q-Pensieve, a policy improvement scheme that stores a collection of Q-snapshots to jointly determine the policy update direction.
We show that Q-Pensieve can be naturally integrated with soft policy iteration with convergence guarantee.
arXiv Detail & Related papers (2022-12-06T16:29:47Z) - IQ-Learn: Inverse soft-Q Learning for Imitation [95.06031307730245]
imitation learning from a small amount of expert data can be challenging in high-dimensional environments with complex dynamics.
Behavioral cloning is a simple method that is widely used due to its simplicity of implementation and stable convergence.
We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function.
arXiv Detail & Related papers (2021-06-23T03:43:10Z) - Continuous Transition: Improving Sample Efficiency for Continuous
Control Problems via MixUp [119.69304125647785]
This paper introduces a concise yet powerful method to construct Continuous Transition.
Specifically, we propose to synthesize new transitions for training by linearly interpolating the consecutive transitions.
To keep the constructed transitions authentic, we also develop a discriminator to guide the construction process automatically.
arXiv Detail & Related papers (2020-11-30T01:20:23Z) - Generalized Hindsight for Reinforcement Learning [154.0545226284078]
We argue that low-reward data collected while trying to solve one task provides little to no signal for solving that particular task.
We present Generalized Hindsight: an approximate inverse reinforcement learning technique for relabeling behaviors with the right tasks.
arXiv Detail & Related papers (2020-02-26T18:57:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.