Agnostic Interactive Imitation Learning: New Theory and Practical Algorithms
- URL: http://arxiv.org/abs/2312.16860v2
- Date: Wed, 17 Jul 2024 10:05:36 GMT
- Title: Agnostic Interactive Imitation Learning: New Theory and Practical Algorithms
- Authors: Yichen Li, Chicheng Zhang,
- Abstract summary: We study interactive imitation learning, where a learner interactively queries a demonstrating expert for action annotations.
We propose a new oracle-efficient algorithm MFTPL-P with provable finite-sample guarantees.
- Score: 22.703438243976876
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study interactive imitation learning, where a learner interactively queries a demonstrating expert for action annotations, aiming to learn a policy that has performance competitive with the expert, using as few annotations as possible. We focus on the general agnostic setting where the expert demonstration policy may not be contained in the policy class used by the learner. We propose a new oracle-efficient algorithm MFTPL-P (abbreviation for Mixed Follow the Perturbed Leader with Poisson perturbations) with provable finite-sample guarantees, under the assumption that the learner is given access to samples from some ``explorative'' distribution over states. Our guarantees hold for any policy class, which is considerably broader than prior state of the art. We further propose Bootstrap-Dagger, a more practical variant that does not require additional sample access. Empirically, MFTPL-P and Bootstrap-Dagger notably surpass online and offline imitation learning baselines in continuous control tasks.
Related papers
- RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - When is Agnostic Reinforcement Learning Statistically Tractable? [76.1408672715773]
A new complexity measure, called the emphspanning capacity, depends solely on the set $Pi$ and is independent of the MDP dynamics.
We show there exists a policy class $Pi$ with a bounded spanning capacity that requires a superpolynomial number of samples to learn.
This reveals a surprising separation for learnability between generative access and online access models.
arXiv Detail & Related papers (2023-10-09T19:40:54Z) - MEGA-DAgger: Imitation Learning with Multiple Imperfect Experts [7.4506213369860195]
MEGA-DAgger is a new DAgger variant that is suitable for interactive learning with multiple imperfect experts.
We demonstrate that policy learned using MEGA-DAgger can outperform both experts and policies learned using the state-of-the-art interactive imitation learning algorithms.
arXiv Detail & Related papers (2023-03-01T16:40:54Z) - Deconfounding Imitation Learning with Variational Inference [19.99248795957195]
Standard imitation learning can fail when the expert demonstrators have different sensory inputs than the imitating agent.
This is because partial observability gives rise to hidden confounders in the causal graph.
We propose to train a variational inference model to infer the expert's latent information and use it to train a latent-conditional policy.
arXiv Detail & Related papers (2022-11-04T18:00:02Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Combining Online Learning and Offline Learning for Contextual Bandits
with Deficient Support [53.11601029040302]
Current offline-policy learning algorithms are mostly based on inverse propensity score (IPS) weighting.
We propose a novel approach that uses a hybrid of offline learning with online exploration.
Our approach determines an optimal policy with theoretical guarantees using the minimal number of online explorations.
arXiv Detail & Related papers (2021-07-24T05:07:43Z) - Generalization Guarantees for Imitation Learning [6.542289202349586]
Control policies from imitation learning can often fail to generalize to novel environments.
We present rigorous generalization guarantees for imitation learning by leveraging the Probably Approximately Correct (PAC)-Bayes framework.
arXiv Detail & Related papers (2020-08-05T03:04:13Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z) - Analyzing Student Strategies In Blended Courses Using Clickstream Data [32.81171098036632]
We use pattern mining and models borrowed from Natural Language Processing to understand student interactions.
Fine-grained clickstream data is collected through Diderot, a non-commercial educational support system.
Our results suggest that the proposed hybrid NLP methods can provide valuable insights even in the low-data setting of blended courses.
arXiv Detail & Related papers (2020-05-31T03:01:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.