The Sample Complexity of Teaching-by-Reinforcement on Q-Learning
- URL: http://arxiv.org/abs/2006.09324v2
- Date: Mon, 8 Mar 2021 03:35:37 GMT
- Title: The Sample Complexity of Teaching-by-Reinforcement on Q-Learning
- Authors: Xuezhou Zhang, Shubham Kumar Bharti, Yuzhe Ma, Adish Singla, Xiaojin
Zhu
- Abstract summary: We study the sample complexity of teaching, termed as "teaching dimension" (TDim) in the literature, for the teaching-by-reinforcement paradigm.
In this paper, we focus on a specific family of reinforcement learning algorithms, Q-learning, and characterize the TDim under different teachers with varying control power over the environment.
Our TDim results provide the minimum number of samples needed for reinforcement learning, and we discuss their connections to standard PAC-style RL sample complexity and teaching-by-demonstration sample complexity results.
- Score: 40.37954633873304
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the sample complexity of teaching, termed as "teaching dimension"
(TDim) in the literature, for the teaching-by-reinforcement paradigm, where the
teacher guides the student through rewards. This is distinct from the
teaching-by-demonstration paradigm motivated by robotics applications, where
the teacher teaches by providing demonstrations of state/action trajectories.
The teaching-by-reinforcement paradigm applies to a wider range of real-world
settings where a demonstration is inconvenient, but has not been studied
systematically. In this paper, we focus on a specific family of reinforcement
learning algorithms, Q-learning, and characterize the TDim under different
teachers with varying control power over the environment, and present matching
optimal teaching algorithms. Our TDim results provide the minimum number of
samples needed for reinforcement learning, and we discuss their connections to
standard PAC-style RL sample complexity and teaching-by-demonstration sample
complexity results. Our teaching algorithms have the potential to speed up RL
agent learning in applications where a helpful teacher is available.
Related papers
- Automatic Curriculum Learning with Gradient Reward Signals [0.0]
We introduce a framework where the teacher model, utilizing the gradient norm information of a student model, dynamically adapts the learning curriculum.
We analyze how gradient norm rewards influence the teacher's ability to craft challenging yet achievable learning sequences, ultimately enhancing the student's performance.
arXiv Detail & Related papers (2023-12-21T04:19:43Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Learning Multi-Objective Curricula for Deep Reinforcement Learning [55.27879754113767]
Various automatic curriculum learning (ACL) methods have been proposed to improve the sample efficiency and final performance of deep reinforcement learning (DRL)
In this paper, we propose a unified automatic curriculum learning framework to create multi-objective but coherent curricula.
In addition to existing hand-designed curricula paradigms, we further design a flexible memory mechanism to learn an abstract curriculum.
arXiv Detail & Related papers (2021-10-06T19:30:25Z) - RLTutor: Reinforcement Learning Based Adaptive Tutoring System by
Modeling Virtual Student with Fewer Interactions [10.34673089426247]
We propose a framework for optimizing teaching strategies by constructing a virtual model of the student.
Our results can serve as a buffer between theoretical instructional optimization and practical applications in e-learning systems.
arXiv Detail & Related papers (2021-07-31T15:42:03Z) - Distribution Matching for Machine Teaching [64.39292542263286]
Machine teaching is an inverse problem of machine learning that aims at steering the student learner towards its target hypothesis.
Previous studies on machine teaching focused on balancing the teaching risk and cost to find those best teaching examples.
This paper presents a distribution matching-based machine teaching strategy.
arXiv Detail & Related papers (2021-05-06T09:32:57Z) - Provably Efficient Exploration for Reinforcement Learning Using
Unsupervised Learning [96.78504087416654]
Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems, we investigate when this paradigm is provably efficient.
We present a general algorithmic framework that is built upon two components: an unsupervised learning algorithm and a noregret tabular RL algorithm.
arXiv Detail & Related papers (2020-03-15T19:23:59Z) - Provable Representation Learning for Imitation Learning via Bi-level
Optimization [60.059520774789654]
A common strategy in modern learning systems is to learn a representation that is useful for many tasks.
We study this strategy in the imitation learning setting for Markov decision processes (MDPs) where multiple experts' trajectories are available.
We instantiate this framework for the imitation learning settings of behavior cloning and observation-alone.
arXiv Detail & Related papers (2020-02-24T21:03:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.