A Critique of Strictly Batch Imitation Learning
- URL: http://arxiv.org/abs/2110.02063v1
- Date: Tue, 5 Oct 2021 14:07:30 GMT
- Title: A Critique of Strictly Batch Imitation Learning
- Authors: Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu
- Abstract summary: We argue that notational issues obscure how the psuedo-state visitation distribution might be disconnected from the policy's $textittrue$ state visitation distribution.
We construct examples where the parameter coupling advocated by Jarrett et al. leads to inconsistent estimates of the expert's policy, unlike behavioral cloning.
- Score: 26.121994149869767
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recent work by Jarrett et al. attempts to frame the problem of offline
imitation learning (IL) as one of learning a joint energy-based model, with the
hope of out-performing standard behavioral cloning. We suggest that notational
issues obscure how the psuedo-state visitation distribution the authors propose
to optimize might be disconnected from the policy's $\textit{true}$ state
visitation distribution. We further construct natural examples where the
parameter coupling advocated by Jarrett et al. leads to inconsistent estimates
of the expert's policy, unlike behavioral cloning.
Related papers
- Robust Behavior Cloning Via Global Lipschitz Regularization [0.5767156832161817]
Behavior Cloning is an effective imitation learning technique and has even been adopted in some safety-critical domains such as autonomous vehicles.<n>We use a global Lipschitz regularization approach to enhance the robustness of the learned policy network.<n>We propose a way to construct a Lipschitz neural network that ensures the policy robustness.
arXiv Detail & Related papers (2025-06-24T02:19:08Z) - Nonparametric Partial Disentanglement via Mechanism Sparsity: Sparse
Actions, Interventions and Sparse Temporal Dependencies [58.179981892921056]
This work introduces a novel principle for disentanglement we call mechanism sparsity regularization.
We propose a representation learning method that induces disentanglement by simultaneously learning the latent factors.
We show that the latent factors can be recovered by regularizing the learned causal graph to be sparse.
arXiv Detail & Related papers (2024-01-10T02:38:21Z) - A Simple Solution for Offline Imitation from Observations and Examples
with Possibly Incomplete Trajectories [122.11358440078581]
offline imitation is useful in real-world scenarios where arbitrary interactions are costly and expert actions are unavailable.
We propose Trajectory-Aware Learning from Observations (TAILO) to solve MDPs where only task-specific expert states and task-agnostic non-expert state-action pairs are available.
arXiv Detail & Related papers (2023-11-02T15:41:09Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - Online Learning with Off-Policy Feedback [18.861989132159945]
We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback.
We propose a set of algorithms that guarantee regret bounds that scale with a natural notion of mismatch between any comparator policy and the behavior policy.
arXiv Detail & Related papers (2022-07-18T21:57:16Z) - GELATO: Geometrically Enriched Latent Model for Offline Reinforcement
Learning [54.291331971813364]
offline reinforcement learning approaches can be divided into proximal and uncertainty-aware methods.
In this work, we demonstrate the benefit of combining the two in a latent variational model.
Our proposed metrics measure both the quality of out of distribution samples as well as the discrepancy of examples in the data.
arXiv Detail & Related papers (2021-02-22T19:42:40Z) - A Contraction Approach to Model-based Reinforcement Learning [11.701145942745274]
We analyze the error in the cumulative reward using a contraction approach.
We prove that branched rollouts can reduce this error.
In this case, we show that GAN-type learning has an advantage over Behavioral Cloning when its discriminator is well-trained.
arXiv Detail & Related papers (2020-09-18T02:03:14Z) - Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with
Latent Confounders [62.54431888432302]
We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders.
We show how, given only a latent variable model for states and actions, policy value can be identified from off-policy data.
arXiv Detail & Related papers (2020-07-27T22:19:01Z) - Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state.
We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling.
We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.