Related papers: Robust Learning from Observation with Model Misspecification

Robust Learning from Observation with Model Misspecification

URL: http://arxiv.org/abs/2202.06003v2
Date: Tue, 15 Feb 2022 10:35:34 GMT
Title: Robust Learning from Observation with Model Misspecification
Authors: Luca Viano, Yu-Ting Huang, Parameswaran Kamalaruban, Craig Innes, Subramanian Ramamoorthy, Adrian Weller
Abstract summary: Imitation learning (IL) is a popular paradigm for training policies in robotic systems. We propose a robust IL algorithm to learn policies that can effectively transfer to the real environment without fine-tuning.
Score: 33.92371002674386
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Imitation learning (IL) is a popular paradigm for training policies in robotic systems when specifying the reward function is difficult. However, despite the success of IL algorithms, they impose the somewhat unrealistic requirement that the expert demonstrations must come from the same domain in which a new imitator policy is to be learned. We consider a practical setting, where (i) state-only expert demonstrations from the real (deployment) environment are given to the learner, (ii) the imitation learner policy is trained in a simulation (training) environment whose transition dynamics is slightly different from the real environment, and (iii) the learner does not have any access to the real environment during the training phase beyond the batch of demonstrations given. Most of the current IL methods, such as generative adversarial imitation learning and its state-only variants, fail to imitate the optimal expert behavior under the above setting. By leveraging insights from the Robust reinforcement learning (RL) literature and building on recent adversarial imitation approaches, we propose a robust IL algorithm to learn policies that can effectively transfer to the real environment without fine-tuning. Furthermore, we empirically demonstrate on continuous-control benchmarks that our method outperforms the state-of-the-art state-only IL method in terms of the zero-shot transfer performance in the real environment and robust performance under different testing conditions.

Related papers

Latent Wasserstein Adversarial Imitation Learning [110.12916356445908]
Imitation Learning (IL) enables agents to mimic expert behavior by learning from demonstrations.<n>We propose Latent Wasserstein Adrial Imitation Learning (LWAIL), a novel adversarial imitation learning framework.<n>We show that our method outperforms prior Wasserstein-based IL methods and prior adversarial IL methods.
arXiv Detail & Related papers (2026-03-05T18:01:49Z)
Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA) Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning. We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z)
Robust Visual Imitation Learning with Inverse Dynamics Representations [32.806294517277976]
We develop an inverse dynamics state representation learning objective to align the expert environment and the learning environment. With the abstract state representation, we design an effective reward function, which thoroughly measures the similarity between behavior data and expert data. Our approach can achieve a near-expert performance in most environments, and significantly outperforms the state-of-the-art visual IL methods and robust IL methods.
arXiv Detail & Related papers (2023-10-22T11:47:35Z)
Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable Environments [45.213059639254475]
We propose a new topic called imitator learning (ItorL) It aims to derive an imitator module that can reconstruct the imitation policies based on very limited expert demonstrations. For autonomous imitation policy building, we design a demonstration-based attention architecture for imitator policy.
arXiv Detail & Related papers (2023-10-09T13:35:28Z)
A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning [61.406020873047794]
A major hurdle to real-world application arises from the development of algorithms in an episodic setting. We propose a new method, MEDAL, that trains the backward policy to match the state distribution in the provided demonstrations. Our experiments show that MEDAL matches or outperforms prior methods on three sparse-reward continuous control tasks.
arXiv Detail & Related papers (2022-05-11T00:06:29Z)
Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning. We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z)
Imitation Learning by State-Only Distribution Matching [2.580765958706854]
Imitation Learning from observation describes policy learning in a similar way to human learning. We propose a non-adversarial learning-from-observations approach, together with an interpretable convergence and performance metric.
arXiv Detail & Related papers (2022-02-09T08:38:50Z)
DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator. Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms. This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk. We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
Generalization Guarantees for Imitation Learning [6.542289202349586]
Control policies from imitation learning can often fail to generalize to novel environments. We present rigorous generalization guarantees for imitation learning by leveraging the Probably Approximately Correct (PAC)-Bayes framework.
arXiv Detail & Related papers (2020-08-05T03:04:13Z)
Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment. One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting. But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient. We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z)
Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z)
State-Only Imitation Learning for Dexterous Manipulation [63.03621861920732]
In this paper, we explore state-only imitation learning. We train an inverse dynamics model and use it to predict actions for state-only demonstrations. Our method performs on par with state-action approaches and considerably outperforms RL alone.
arXiv Detail & Related papers (2020-04-07T17:57:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.