Related papers: A Critique of Strictly Batch Imitation Learning

A Critique of Strictly Batch Imitation Learning

URL: http://arxiv.org/abs/2110.02063v1
Date: Tue, 5 Oct 2021 14:07:30 GMT
Title: A Critique of Strictly Batch Imitation Learning
Authors: Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu
Abstract summary: We argue that notational issues obscure how the psuedo-state visitation distribution might be disconnected from the policy's $textittrue$ state visitation distribution. We construct examples where the parameter coupling advocated by Jarrett et al. leads to inconsistent estimates of the expert's policy, unlike behavioral cloning.
Score: 26.121994149869767
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Recent work by Jarrett et al. attempts to frame the problem of offline imitation learning (IL) as one of learning a joint energy-based model, with the hope of out-performing standard behavioral cloning. We suggest that notational issues obscure how the psuedo-state visitation distribution the authors propose to optimize might be disconnected from the policy's $\textit{true}$ state visitation distribution. We further construct natural examples where the parameter coupling advocated by Jarrett et al. leads to inconsistent estimates of the expert's policy, unlike behavioral cloning.

Related papers

Robust Behavior Cloning Via Global Lipschitz Regularization [0.5767156832161817]
Behavior Cloning is an effective imitation learning technique and has even been adopted in some safety-critical domains such as autonomous vehicles.<n>We use a global Lipschitz regularization approach to enhance the robustness of the learned policy network.<n>We propose a way to construct a Lipschitz neural network that ensures the policy robustness.
arXiv Detail & Related papers (2025-06-24T02:19:08Z)
Nonparametric Partial Disentanglement via Mechanism Sparsity: Sparse Actions, Interventions and Sparse Temporal Dependencies [58.179981892921056]
This work introduces a novel principle for disentanglement we call mechanism sparsity regularization. We propose a representation learning method that induces disentanglement by simultaneously learning the latent factors. We show that the latent factors can be recovered by regularizing the learned causal graph to be sparse.
arXiv Detail & Related papers (2024-01-10T02:38:21Z)
A Simple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories [122.11358440078581]
offline imitation is useful in real-world scenarios where arbitrary interactions are costly and expert actions are unavailable. We propose Trajectory-Aware Learning from Observations (TAILO) to solve MDPs where only task-specific expert states and task-agnostic non-expert state-action pairs are available.
arXiv Detail & Related papers (2023-11-02T15:41:09Z)
Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling. We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z)
Online Learning with Off-Policy Feedback [18.861989132159945]
We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback. We propose a set of algorithms that guarantee regret bounds that scale with a natural notion of mismatch between any comparator policy and the behavior policy.
arXiv Detail & Related papers (2022-07-18T21:57:16Z)
GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning [54.291331971813364]
offline reinforcement learning approaches can be divided into proximal and uncertainty-aware methods. In this work, we demonstrate the benefit of combining the two in a latent variational model. Our proposed metrics measure both the quality of out of distribution samples as well as the discrepancy of examples in the data.
arXiv Detail & Related papers (2021-02-22T19:42:40Z)
A Contraction Approach to Model-based Reinforcement Learning [11.701145942745274]
We analyze the error in the cumulative reward using a contraction approach. We prove that branched rollouts can reduce this error. In this case, we show that GAN-type learning has an advantage over Behavioral Cloning when its discriminator is well-trained.
arXiv Detail & Related papers (2020-09-18T02:03:14Z)
Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders [62.54431888432302]
We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders. We show how, given only a latent variable model for states and actions, policy value can be identified from off-policy data.
arXiv Detail & Related papers (2020-07-27T22:19:01Z)
Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state. We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling. We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.