Optimally Teaching a Linear Behavior Cloning Agent
- URL: http://arxiv.org/abs/2311.15399v1
- Date: Sun, 26 Nov 2023 19:47:39 GMT
- Title: Optimally Teaching a Linear Behavior Cloning Agent
- Authors: Shubham Kumar Bharti, Stephen Wright, Adish Singla, Xiaojin Zhu
- Abstract summary: We study optimal teaching of Linear Behavior Cloning (LBC) learners.
In this setup, the teacher can select which states to demonstrate to an LBC learner.
The learner maintains a version space of infinite linear hypotheses consistent with the demonstration.
- Score: 29.290523215922015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study optimal teaching of Linear Behavior Cloning (LBC) learners. In this
setup, the teacher can select which states to demonstrate to an LBC learner.
The learner maintains a version space of infinite linear hypotheses consistent
with the demonstration. The goal of the teacher is to teach a realizable target
policy to the learner using minimum number of state demonstrations. This number
is known as the Teaching Dimension(TD). We present a teaching algorithm called
``Teach using Iterative Elimination(TIE)" that achieves instance optimal TD.
However, we also show that finding optimal teaching set computationally is
NP-hard. We further provide an approximation algorithm that guarantees an
approximation ratio of $\log(|A|-1)$ on the teaching dimension. Finally, we
provide experimental results to validate the efficiency and effectiveness of
our algorithm.
Related papers
- When Babies Teach Babies: Can student knowledge sharing outperform Teacher-Guided Distillation on small datasets? [0.0]
We present our submission to the BabyLM challenge, aiming to push the boundaries of data-efficient language model pretraining.
We address the limitation of treating students equally by formulating weighted mutual learning as a bi-level optimization problem.
Our evaluations show that teacher-less methods can match or surpass teacher-supervised approaches.
arXiv Detail & Related papers (2024-11-25T15:25:31Z) - Provably Efficient Infinite-Horizon Average-Reward Reinforcement Learning with Linear Function Approximation [1.8416014644193066]
We propose an algorithm for learning linear Markov decision processes (MDPs) and linear mixture MDPs under the Bellman optimality condition.
Our algorithm for linear MDPs achieves the best-known regret upper bound of $widetildemathcalO(d3/2mathrmsp(v*)sqrtT)$ over $T$ time steps.
For linear mixture MDPs, our algorithm attains a regret bound of $widetildemathcalO(dcdotmathrm
arXiv Detail & Related papers (2024-09-16T23:13:42Z) - Towards Optimal Learning of Language Models [124.65669486710992]
We present a theory for the optimal learning of language models (LMs)
We derive a theorem, named Learning Law, to reveal the properties of the dynamics in the optimal learning process under our objective.
We empirically verify that the optimal learning of LMs essentially stems from the improvement of the coefficients in the scaling law of LMs.
arXiv Detail & Related papers (2024-02-27T18:52:19Z) - Provably Efficient Representation Learning with Tractable Planning in
Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs)
We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU)
We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z) - Teaching an Active Learner with Contrastive Examples [35.926575235046634]
We study the problem of active learning with the added twist that the learner is assisted by a helpful teacher.
We investigate an efficient teaching algorithm that adaptively picks contrastive examples.
We derive strong performance guarantees for our algorithm based on two problem-dependent parameters.
arXiv Detail & Related papers (2021-10-28T05:00:55Z) - Best-Case Lower Bounds in Online Learning [9.01310450044549]
Much of the work in online learning focuses on the study of sublinear upper bounds on the regret.
In this work, we initiate the study of best-case lower bounds in online convex optimization.
We show that the linearized version of FTRL can attain negative linear regret.
arXiv Detail & Related papers (2021-06-23T23:24:38Z) - Distribution Matching for Machine Teaching [64.39292542263286]
Machine teaching is an inverse problem of machine learning that aims at steering the student learner towards its target hypothesis.
Previous studies on machine teaching focused on balancing the teaching risk and cost to find those best teaching examples.
This paper presents a distribution matching-based machine teaching strategy.
arXiv Detail & Related papers (2021-05-06T09:32:57Z) - Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function
Approximation [92.3161051419884]
We study the reinforcement learning for finite-horizon episodic Markov decision processes with adversarial reward and full information feedback.
We show that it can achieve $tildeO(dHsqrtT)$ regret, where $H$ is the length of the episode.
We also prove a matching lower bound of $tildeOmega(dHsqrtT)$ up to logarithmic factors.
arXiv Detail & Related papers (2021-02-17T18:54:08Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z) - Meta-learning with Stochastic Linear Bandits [120.43000970418939]
We consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector.
We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
arXiv Detail & Related papers (2020-05-18T08:41:39Z) - Adaptive Teaching of Temporal Logic Formulas to Learners with
Preferences [44.63937003271641]
We investigate machine teaching for temporal logic formulas.
An exhaustive search even for a myopic solution takes exponential time.
We propose an efficient approach for teaching parametric linear temporal logic formulas.
arXiv Detail & Related papers (2020-01-27T18:22:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.