Robust Learning from Observation with Model Misspecification
- URL: http://arxiv.org/abs/2202.06003v2
- Date: Tue, 15 Feb 2022 10:35:34 GMT
- Title: Robust Learning from Observation with Model Misspecification
- Authors: Luca Viano, Yu-Ting Huang, Parameswaran Kamalaruban, Craig Innes,
Subramanian Ramamoorthy, Adrian Weller
- Abstract summary: Imitation learning (IL) is a popular paradigm for training policies in robotic systems.
We propose a robust IL algorithm to learn policies that can effectively transfer to the real environment without fine-tuning.
- Score: 33.92371002674386
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imitation learning (IL) is a popular paradigm for training policies in
robotic systems when specifying the reward function is difficult. However,
despite the success of IL algorithms, they impose the somewhat unrealistic
requirement that the expert demonstrations must come from the same domain in
which a new imitator policy is to be learned. We consider a practical setting,
where (i) state-only expert demonstrations from the real (deployment)
environment are given to the learner, (ii) the imitation learner policy is
trained in a simulation (training) environment whose transition dynamics is
slightly different from the real environment, and (iii) the learner does not
have any access to the real environment during the training phase beyond the
batch of demonstrations given. Most of the current IL methods, such as
generative adversarial imitation learning and its state-only variants, fail to
imitate the optimal expert behavior under the above setting. By leveraging
insights from the Robust reinforcement learning (RL) literature and building on
recent adversarial imitation approaches, we propose a robust IL algorithm to
learn policies that can effectively transfer to the real environment without
fine-tuning. Furthermore, we empirically demonstrate on continuous-control
benchmarks that our method outperforms the state-of-the-art state-only IL
method in terms of the zero-shot transfer performance in the real environment
and robust performance under different testing conditions.
Related papers
- Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA)
Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning.
We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - Robust Visual Imitation Learning with Inverse Dynamics Representations [32.806294517277976]
We develop an inverse dynamics state representation learning objective to align the expert environment and the learning environment.
With the abstract state representation, we design an effective reward function, which thoroughly measures the similarity between behavior data and expert data.
Our approach can achieve a near-expert performance in most environments, and significantly outperforms the state-of-the-art visual IL methods and robust IL methods.
arXiv Detail & Related papers (2023-10-22T11:47:35Z) - Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable
Environments [45.213059639254475]
We propose a new topic called imitator learning (ItorL)
It aims to derive an imitator module that can reconstruct the imitation policies based on very limited expert demonstrations.
For autonomous imitation policy building, we design a demonstration-based attention architecture for imitator policy.
arXiv Detail & Related papers (2023-10-09T13:35:28Z) - A State-Distribution Matching Approach to Non-Episodic Reinforcement
Learning [61.406020873047794]
A major hurdle to real-world application arises from the development of algorithms in an episodic setting.
We propose a new method, MEDAL, that trains the backward policy to match the state distribution in the provided demonstrations.
Our experiments show that MEDAL matches or outperforms prior methods on three sparse-reward continuous control tasks.
arXiv Detail & Related papers (2022-05-11T00:06:29Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Imitation Learning by State-Only Distribution Matching [2.580765958706854]
Imitation Learning from observation describes policy learning in a similar way to human learning.
We propose a non-adversarial learning-from-observations approach, together with an interpretable convergence and performance metric.
arXiv Detail & Related papers (2022-02-09T08:38:50Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Generalization Guarantees for Imitation Learning [6.542289202349586]
Control policies from imitation learning can often fail to generalize to novel environments.
We present rigorous generalization guarantees for imitation learning by leveraging the Probably Approximately Correct (PAC)-Bayes framework.
arXiv Detail & Related papers (2020-08-05T03:04:13Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.