On the Theory of Reinforcement Learning with Once-per-Episode Feedback
- URL: http://arxiv.org/abs/2105.14363v1
- Date: Sat, 29 May 2021 19:48:51 GMT
- Title: On the Theory of Reinforcement Learning with Once-per-Episode Feedback
- Authors: Niladri S. Chatterji, Aldo Pacchiano, Peter L. Bartlett, Michael I.
Jordan
- Abstract summary: We introduce a theory of reinforcement learning in which the learner receives feedback only once at the end of an episode.
This is arguably more representative of real-world applications than the traditional requirement that the learner receive feedback at every time step.
- Score: 120.5537226120512
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a theory of reinforcement learning (RL) in which the learner
receives feedback only once at the end of an episode. While this is an extreme
test case for theory, it is also arguably more representative of real-world
applications than the traditional requirement in RL practice that the learner
receive feedback at every time step. Indeed, in many real-world applications of
reinforcement learning, such as self-driving cars and robotics, it is easier to
evaluate whether a learner's complete trajectory was either "good" or "bad,"
but harder to provide a reward signal at each step. To show that learning is
possible in this more challenging setting, we study the case where trajectory
labels are generated by an unknown parametric model, and provide a
statistically and computationally efficient algorithm that achieves sub-linear
regret.
Related papers
- Dense Reward for Free in Reinforcement Learning from Human Feedback [64.92448888346125]
We leverage the fact that the reward model contains more information than just its scalar output.
We use these attention weights to redistribute the reward along the whole completion.
Empirically, we show that it stabilises training, accelerates the rate of learning, and, in practical cases, may lead to better local optima.
arXiv Detail & Related papers (2024-02-01T17:10:35Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Fractional Transfer Learning for Deep Model-Based Reinforcement Learning [0.966840768820136]
Reinforcement learning (RL) is well known for requiring large amounts of data in order for RL agents to learn to perform complex tasks.
Recent progress in model-based RL allows agents to be much more data-efficient.
We present a simple alternative approach: fractional transfer learning.
arXiv Detail & Related papers (2021-08-14T12:44:42Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - A Theory of Universal Learning [26.51949485387526]
We show that there are only three possible rates of universal learning.
We show that the learning curves of any given concept class decay either at an exponential, or arbitrarily slow rates.
arXiv Detail & Related papers (2020-11-09T15:10:32Z) - Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms.
We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.