Selective Sampling and Imitation Learning via Online Regression
- URL: http://arxiv.org/abs/2307.04998v1
- Date: Tue, 11 Jul 2023 03:32:20 GMT
- Title: Selective Sampling and Imitation Learning via Online Regression
- Authors: Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu
- Abstract summary: We consider the problem of Imitation Learning (IL) by actively querying noisy expert for feedback.
We provide an interactive algorithm for IL that uses selective sampling to actively query the noisy expert for feedback.
- Score: 17.73844193143454
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of Imitation Learning (IL) by actively querying noisy
expert for feedback. While imitation learning has been empirically successful,
much of prior work assumes access to noiseless expert feedback which is not
practical in many applications. In fact, when one only has access to noisy
expert feedback, algorithms that rely on purely offline data (non-interactive
IL) can be shown to need a prohibitively large number of samples to be
successful. In contrast, in this work, we provide an interactive algorithm for
IL that uses selective sampling to actively query the noisy expert for
feedback. Our contributions are twofold: First, we provide a new selective
sampling algorithm that works with general function classes and multiple
actions, and obtains the best-known bounds for the regret and the number of
queries. Next, we extend this analysis to the problem of IL with noisy expert
feedback and provide a new IL algorithm that makes limited queries.
Our algorithm for selective sampling leverages function approximation, and
relies on an online regression oracle w.r.t.~the given model class to predict
actions, and to decide whether to query the expert for its label. On the
theoretical side, the regret bound of our algorithm is upper bounded by the
regret of the online regression oracle, while the query complexity additionally
depends on the eluder dimension of the model class. We complement this with a
lower bound that demonstrates that our results are tight. We extend our
selective sampling algorithm for IL with general function approximation and
provide bounds on both the regret and the number of queries made to the noisy
expert. A key novelty here is that our regret and query complexity bounds only
depend on the number of times the optimal policy (and not the noisy expert, or
the learner) go to states that have a small margin.
Related papers
- Multi-Armed Bandits with Abstention [62.749500564313834]
We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention.
In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the instantaneous reward before observing it.
arXiv Detail & Related papers (2024-02-23T06:27:12Z) - Making RL with Preference-based Feedback Efficient via Randomization [11.019088464664696]
Reinforcement Learning algorithms that learn from human feedback need to be efficient in terms of statistical complexity, computational complexity, and query complexity.
We present an algorithm that is sample efficient (i.e. has near-optimal-case regret bounds) and has running time (i.e. computational complexity is worst with respect to relevant parameters)
To extend the results to more general nonlinear function approximation, we design a model-based randomized algorithm inspired by the idea of Thompson sampling.
arXiv Detail & Related papers (2023-10-23T04:19:35Z) - Contextual Bandits and Imitation Learning via Preference-Based Active
Queries [17.73844193143454]
We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward.
Instead, the learner can actively query an expert at each round to compare two actions and receive noisy preference feedback.
The learner's objective is two-fold: to minimize the regret associated with the executed actions, while simultaneously, minimizing the number of comparison queries made to the expert.
arXiv Detail & Related papers (2023-07-24T16:36:04Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - An Investigation of Replay-based Approaches for Continual Learning [79.0660895390689]
Continual learning (CL) is a major challenge of machine learning (ML) and describes the ability to learn several tasks sequentially without catastrophic forgetting (CF)
Several solution classes have been proposed, of which so-called replay-based approaches seem very promising due to their simplicity and robustness.
We empirically investigate replay-based approaches of continual learning and assess their potential for applications.
arXiv Detail & Related papers (2021-08-15T15:05:02Z) - Efficient First-Order Contextual Bandits: Prediction, Allocation, and
Triangular Discrimination [82.52105963476703]
A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise.
First-order guarantees are relatively well understood in statistical and online learning.
We show that the logarithmic loss and an information-theoretic quantity called the triangular discrimination play a fundamental role in obtaining first-order guarantees.
arXiv Detail & Related papers (2021-07-05T19:20:34Z) - Differentially Private Query Release Through Adaptive Projection [19.449593001368193]
We propose, implement, and evaluate a new algorithm for releasing answers to very large numbers of statistical queries like $k$-way marginals.
Our algorithm makes adaptive use of a continuous relaxation of the Projection Mechanism, which answers queries on the private dataset using simple perturbation.
We find that our method outperforms existing algorithms in many cases, especially when the privacy budget is small or the query class is large.
arXiv Detail & Related papers (2021-03-11T12:43:18Z) - Instance-Dependent Complexity of Contextual Bandits and Reinforcement
Learning: A Disagreement-Based Perspective [104.67295710363679]
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm.
We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds.
We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case.
arXiv Detail & Related papers (2020-10-07T01:33:06Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Active Imitation Learning with Noisy Guidance [6.832341432995627]
Imitation learning algorithms provide state-of-the-art results on many structured prediction tasks.
Such algorithms assume training-time access to an expert that can provide the optimal action at any queried state.
We consider an active learning setting in which the learning algorithm has additional access to a much cheaper noisy that provides noisy guidance.
arXiv Detail & Related papers (2020-05-26T15:35:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.