Leveraging Prior Knowledge in Reinforcement Learning via Double-Sided
Bounds on the Value Function
- URL: http://arxiv.org/abs/2302.09676v2
- Date: Fri, 1 Sep 2023 18:03:09 GMT
- Title: Leveraging Prior Knowledge in Reinforcement Learning via Double-Sided
Bounds on the Value Function
- Authors: Jacob Adamczyk, Stas Tiomkin, Rahul Kulkarni
- Abstract summary: We show how an arbitrary approximation for the value function can be used to derive double-sided bounds on the optimal value function of interest.
We extend the framework with error analysis for continuous state and action spaces.
- Score: 4.48890356952206
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An agent's ability to leverage past experience is critical for efficiently
solving new tasks. Approximate solutions for new tasks can be obtained from
previously derived value functions, as demonstrated by research on transfer
learning, curriculum learning, and compositionality. However, prior work has
primarily focused on using value functions to obtain zero-shot approximations
for solutions to a new task. In this work, we show how an arbitrary
approximation for the value function can be used to derive double-sided bounds
on the optimal value function of interest. We further extend the framework with
error analysis for continuous state and action spaces. The derived results lead
to new approaches for clipping during training which we validate numerically in
simple domains.
Related papers
- Boosting Soft Q-Learning by Bounding [4.8748194765816955]
We show how any value function estimate can also be used to derive double-sided bounds on the optimal value function.
The derived bounds lead to new approaches for boosting training performance.
arXiv Detail & Related papers (2024-06-26T03:02:22Z) - Bounding the Optimal Value Function in Compositional Reinforcement
Learning [2.7998963147546148]
We show that the optimal solution for a composite task can be related to the known primitive task solutions.
We also show that the regret of using a zero-shot policy can be bounded for this class of functions.
arXiv Detail & Related papers (2023-03-05T03:06:59Z) - Accelerating Policy Gradient by Estimating Value Function from Prior
Computation in Deep Reinforcement Learning [16.999444076456268]
We investigate the use of prior computation to estimate the value function to improve sample efficiency in on-policy policy gradient methods.
In particular, we learn a new value function for the target task while combining it with a value estimate from the prior.
The resulting value function is used as a baseline in the policy gradient method.
arXiv Detail & Related papers (2023-02-02T20:23:22Z) - A Generalized Bootstrap Target for Value-Learning, Efficiently Combining
Value and Feature Predictions [39.17511693008055]
Estimating value functions is a core component of reinforcement learning algorithms.
We focus on bootstrapping targets used when estimating value functions.
We propose a new backup target, the $eta$-return mixture.
arXiv Detail & Related papers (2022-01-05T21:54:55Z) - Relational Experience Replay: Continual Learning by Adaptively Tuning
Task-wise Relationship [54.73817402934303]
We propose Experience Continual Replay (ERR), a bi-level learning framework to adaptively tune task-wise to achieve a better stability plasticity' tradeoff.
ERR can consistently improve the performance of all baselines and surpass current state-of-the-art methods.
arXiv Detail & Related papers (2021-12-31T12:05:22Z) - Taylor Expansion of Discount Factors [56.46324239692532]
In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective.
In this work, we study the effect that this discrepancy of discount factors has during learning, and discover a family of objectives that interpolate value functions of two distinct discount factors.
arXiv Detail & Related papers (2021-06-11T05:02:17Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z) - Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks.
In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other.
This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z) - Sequential Transfer in Reinforcement Learning with a Generative Model [48.40219742217783]
We show how to reduce the sample complexity for learning new tasks by transferring knowledge from previously-solved ones.
We derive PAC bounds on its sample complexity which clearly demonstrate the benefits of using this kind of prior knowledge.
We empirically verify our theoretical findings in simple simulated domains.
arXiv Detail & Related papers (2020-07-01T19:53:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.