IQ-Learn: Inverse soft-Q Learning for Imitation
- URL: http://arxiv.org/abs/2106.12142v1
- Date: Wed, 23 Jun 2021 03:43:10 GMT
- Title: IQ-Learn: Inverse soft-Q Learning for Imitation
- Authors: Divyansh Garg, Shuvam Chakraborty, Chris Cundy, Jiaming Song, Stefano
Ermon
- Abstract summary: imitation learning from a small amount of expert data can be challenging in high-dimensional environments with complex dynamics.
Behavioral cloning is a simple method that is widely used due to its simplicity of implementation and stable convergence.
We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function.
- Score: 95.06031307730245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many sequential decision-making problems (e.g., robotics control, game
playing, sequential prediction), human or expert data is available containing
useful information about the task. However, imitation learning (IL) from a
small amount of expert data can be challenging in high-dimensional environments
with complex dynamics. Behavioral cloning is a simple method that is widely
used due to its simplicity of implementation and stable convergence but doesn't
utilize any information involving the environment's dynamics. Many existing
methods that exploit dynamics information are difficult to train in practice
due to an adversarial optimization process over reward and policy approximators
or biased, high variance gradient estimators. We introduce a method for
dynamics-aware IL which avoids adversarial training by learning a single
Q-function, implicitly representing both reward and policy. On standard
benchmarks, the implicitly learned rewards show a high positive correlation
with the ground-truth rewards, illustrating our method can also be used for
inverse reinforcement learning (IRL). Our method, Inverse soft-Q learning
(IQ-Learn) obtains state-of-the-art results in offline and online imitation
learning settings, surpassing existing methods both in the number of required
environment interactions and scalability in high-dimensional spaces.
Related papers
- Reward-free World Models for Online Imitation Learning [25.304836126280424]
We propose a novel approach to online imitation learning that leverages reward-free world models.
Our method learns environmental dynamics entirely in latent spaces without reconstruction, enabling efficient and accurate modeling.
We evaluate our method on a diverse set of benchmarks, including DMControl, MyoSuite, and ManiSkill2, demonstrating superior empirical performance compared to existing approaches.
arXiv Detail & Related papers (2024-10-17T23:13:32Z) - Efficient Imitation Learning with Conservative World Models [54.52140201148341]
We tackle the problem of policy learning from expert demonstrations without a reward function.
We re-frame imitation learning as a fine-tuning problem, rather than a pure reinforcement learning one.
arXiv Detail & Related papers (2024-05-21T20:53:18Z) - Extrinsicaly Rewarded Soft Q Imitation Learning with Discriminator [0.0]
Supervised learning methods such as Behavioral Cloning do not require sampling data, but usually suffer from distribution shift.
Soft Q imitation learning (SQIL) addressed the problems, and it was shown that it could learn efficiently by combining Behavioral Cloning and soft Q-learning with constant rewards.
arXiv Detail & Related papers (2024-01-30T06:22:19Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Robust Visual Imitation Learning with Inverse Dynamics Representations [32.806294517277976]
We develop an inverse dynamics state representation learning objective to align the expert environment and the learning environment.
With the abstract state representation, we design an effective reward function, which thoroughly measures the similarity between behavior data and expert data.
Our approach can achieve a near-expert performance in most environments, and significantly outperforms the state-of-the-art visual IL methods and robust IL methods.
arXiv Detail & Related papers (2023-10-22T11:47:35Z) - Conditional Kernel Imitation Learning for Continuous State Environments [9.750698192309978]
We introduce a novel conditional kernel density estimation-based imitation learning framework.
We show consistently superior empirical performance over many state-of-the-art IL algorithms.
arXiv Detail & Related papers (2023-08-24T05:26:42Z) - Contrastive Example-Based Control [163.6482792040079]
We propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function.
Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions.
arXiv Detail & Related papers (2023-07-24T19:43:22Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.