A Simple Solution for Offline Imitation from Observations and Examples
with Possibly Incomplete Trajectories
- URL: http://arxiv.org/abs/2311.01329v1
- Date: Thu, 2 Nov 2023 15:41:09 GMT
- Title: A Simple Solution for Offline Imitation from Observations and Examples
with Possibly Incomplete Trajectories
- Authors: Kai Yan, Alexander G. Schwing, Yu-Xiong Wang
- Abstract summary: offline imitation is useful in real-world scenarios where arbitrary interactions are costly and expert actions are unavailable.
We propose Trajectory-Aware Learning from Observations (TAILO) to solve MDPs where only task-specific expert states and task-agnostic non-expert state-action pairs are available.
- Score: 122.11358440078581
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Offline imitation from observations aims to solve MDPs where only
task-specific expert states and task-agnostic non-expert state-action pairs are
available. Offline imitation is useful in real-world scenarios where arbitrary
interactions are costly and expert actions are unavailable. The
state-of-the-art "DIstribution Correction Estimation" (DICE) methods minimize
divergence of state occupancy between expert and learner policies and retrieve
a policy with weighted behavior cloning; however, their results are unstable
when learning from incomplete trajectories, due to a non-robust optimization in
the dual domain. To address the issue, in this paper, we propose
Trajectory-Aware Imitation Learning from Observations (TAILO). TAILO uses a
discounted sum along the future trajectory as the weight for weighted behavior
cloning. The terms for the sum are scaled by the output of a discriminator,
which aims to identify expert states. Despite simplicity, TAILO works well if
there exist trajectories or segments of expert behavior in the task-agnostic
data, a common assumption in prior work. In experiments across multiple
testbeds, we find TAILO to be more robust and effective, particularly with
incomplete trajectories.
Related papers
- Offline Imitation Learning with Model-based Reverse Augmentation [48.64791438847236]
We propose a novel model-based framework, called offline Imitation Learning with Self-paced Reverse Augmentation.
Specifically, we build a reverse dynamic model from the offline demonstrations, which can efficiently generate trajectories leading to the expert-observed states.
We use the subsequent reinforcement learning method to learn from the augmented trajectories and transit from expert-unobserved states to expert-observed states.
arXiv Detail & Related papers (2024-06-18T12:27:02Z) - How to Leverage Diverse Demonstrations in Offline Imitation Learning [39.24627312800116]
Offline Imitation Learning (IL) with imperfect demonstrations has garnered increasing attention owing to the scarcity of expert data.
We introduce a simple yet effective data selection method that identifies positive behaviors based on their resultant states.
We then devise a lightweight behavior cloning algorithm capable of leveraging the expert and selected data correctly.
arXiv Detail & Related papers (2024-05-24T04:56:39Z) - Align Your Intents: Offline Imitation Learning via Optimal Transport [3.1728695158666396]
We show that an imitating agent can still learn the desired behavior merely from observing the expert.
In our method, AILOT, we involve special representation of states in a form of intents that incorporate pairwise spatial distances within the data.
We report that AILOT outperforms state-of-the art offline imitation learning algorithms on D4RL benchmarks.
arXiv Detail & Related papers (2024-02-20T14:24:00Z) - Efficient local linearity regularization to overcome catastrophic
overfitting [59.463867084204566]
Catastrophic overfitting (CO) in single-step adversarial training results in abrupt drops in the adversarial test accuracy (even down to 0%)
We introduce a regularization term, called ELLE, to mitigate CO effectively and efficiently in classical AT evaluations.
arXiv Detail & Related papers (2024-01-21T22:55:26Z) - Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement
Learning [44.50394347326546]
Off-policy learning from multistep returns is crucial for sample-efficient reinforcement learning.
Off-policy bias is corrected in a per-decision manner, but once a trace has been fully cut, the effect cannot be reversed.
We propose a multistep operator that can express both per-decision and trajectory-aware methods.
arXiv Detail & Related papers (2023-01-26T18:57:41Z) - LobsDICE: Offline Imitation Learning from Observation via Stationary
Distribution Correction Estimation [37.31080581310114]
We present LobsDICE, an offline IfO algorithm that learns to imitate the expert policy via optimization in the space of stationary distributions.
Our algorithm solves a single convex minimization problem, which minimizes the divergence between the two state-transition distributions induced by the expert and the agent policy.
arXiv Detail & Related papers (2022-02-28T04:24:30Z) - Agree to Disagree: Diversity through Disagreement for Better
Transferability [54.308327969778155]
We propose D-BAT (Diversity-By-disAgreement Training), which enforces agreement among the models on the training data.
We show how D-BAT naturally emerges from the notion of generalized discrepancy.
arXiv Detail & Related papers (2022-02-09T12:03:02Z) - Benchmarking Deep Models for Salient Object Detection [67.07247772280212]
We construct a general SALient Object Detection (SALOD) benchmark to conduct a comprehensive comparison among several representative SOD methods.
In the above experiments, we find that existing loss functions usually specialized in some metrics but reported inferior results on the others.
We propose a novel Edge-Aware (EA) loss that promotes deep networks to learn more discriminative features by integrating both pixel- and image-level supervision signals.
arXiv Detail & Related papers (2022-02-07T03:43:16Z) - Mitigating Covariate Shift in Imitation Learning via Offline Data
Without Great Coverage [27.122391441921664]
This paper studies offline Imitation Learning (IL) where an agent learns to imitate an expert demonstrator without additional online environment interactions.
Instead, the learner is presented with a static offline dataset of state-action-next state transition triples from a potentially less proficient behavior policy.
We introduce Model-based IL from Offline data (MILO) to solve the offline IL problem efficiently both in theory and in practice.
arXiv Detail & Related papers (2021-06-06T18:31:08Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.