Discriminator-Guided Model-Based Offline Imitation Learning
- URL: http://arxiv.org/abs/2207.00244v2
- Date: Tue, 5 Jul 2022 03:18:20 GMT
- Title: Discriminator-Guided Model-Based Offline Imitation Learning
- Authors: Wenjia Zhang, Haoran Xu, Haoyi Niu, Peng Cheng, Ming Li, Heming Zhang,
Guyue Zhou, Xianyuan Zhan
- Abstract summary: offline imitation learning (IL) is a powerful method to solve decision-making problems from expert demonstrations without reward labels.
We propose the Discriminator-guided Model-based offline Learning (DMIL) framework, which introduces a discriminator to simultaneously distinguish the dynamics correctness and suboptimality of model rollout data.
Experimental results show that DMIL and its extension achieve superior performance and robustness compared to state-of-the-art offline IL methods under small datasets.
- Score: 11.856949845359853
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline imitation learning (IL) is a powerful method to solve decision-making
problems from expert demonstrations without reward labels. Existing offline IL
methods suffer from severe performance degeneration under limited expert data
due to covariate shift. Including a learned dynamics model can potentially
improve the state-action space coverage of expert data, however, it also faces
challenging issues like model approximation/generalization errors and
suboptimality of rollout data. In this paper, we propose the
Discriminator-guided Model-based offline Imitation Learning (DMIL) framework,
which introduces a discriminator to simultaneously distinguish the dynamics
correctness and suboptimality of model rollout data against real expert
demonstrations. DMIL adopts a novel cooperative-yet-adversarial learning
strategy, which uses the discriminator to guide and couple the learning process
of the policy and dynamics model, resulting in improved model performance and
robustness. Our framework can also be extended to the case when demonstrations
contain a large proportion of suboptimal data. Experimental results show that
DMIL and its extension achieve superior performance and robustness compared to
state-of-the-art offline IL methods under small datasets.
Related papers
- Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration [74.09687562334682]
We introduce a novel training data attribution method called Debias and Denoise Attribution (DDA)
Our method significantly outperforms existing approaches, achieving an averaged AUC of 91.64%.
DDA exhibits strong generality and scalability across various sources and different-scale models like LLaMA2, QWEN2, and Mistral.
arXiv Detail & Related papers (2024-10-02T07:14:26Z) - SeMOPO: Learning High-quality Model and Policy from Low-quality Offline Visual Datasets [32.496818080222646]
We propose a new approach to model-based offline reinforcement learning.
We provide a theoretical guarantee of model uncertainty and performance bound of SeMOPO.
Experimental results show that our method substantially outperforms all baseline methods.
arXiv Detail & Related papers (2024-06-13T15:16:38Z) - Relation Modeling and Distillation for Learning with Noisy Labels [4.556974104115929]
This paper proposes a relation modeling and distillation framework that models inter-sample relationships via self-supervised learning.
The proposed framework can learn discriminative representations for noisy data, which results in superior performance than the existing methods.
arXiv Detail & Related papers (2024-05-30T01:47:27Z) - Deep autoregressive density nets vs neural ensembles for model-based
offline reinforcement learning [2.9158689853305693]
We consider a model-based reinforcement learning algorithm that infers the system dynamics from the available data and performs policy optimization on imaginary model rollouts.
This approach is vulnerable to exploiting model errors which can lead to catastrophic failures on the real system.
We show that better performance can be obtained with a single well-calibrated autoregressive model on the D4RL benchmark.
arXiv Detail & Related papers (2024-02-05T10:18:15Z) - Inverse Reinforcement Learning by Estimating Expertise of Demonstrators [18.50354748863624]
IRLEED, Inverse Reinforcement Learning by Estimating Expertise of Demonstrators, is a novel framework that overcomes hurdles without prior knowledge of demonstrator expertise.
IRLEED enhances existing Inverse Reinforcement Learning (IRL) algorithms by combining a general model for demonstrator suboptimality to address reward bias and action variance.
Experiments in both online and offline IL settings, with simulated and human-generated data, demonstrate IRLEED's adaptability and effectiveness.
arXiv Detail & Related papers (2024-02-02T20:21:09Z) - Querying Easily Flip-flopped Samples for Deep Active Learning [63.62397322172216]
Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data.
One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is.
This paper proposes the it least disagree metric (LDM) as the smallest probability of disagreement of the predicted label.
arXiv Detail & Related papers (2024-01-18T08:12:23Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.