Taylor Expansion of Discount Factors
- URL: http://arxiv.org/abs/2106.06170v2
- Date: Mon, 14 Jun 2021 19:50:58 GMT
- Title: Taylor Expansion of Discount Factors
- Authors: Yunhao Tang, Mark Rowland, R\'emi Munos, Michal Valko
- Abstract summary: In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective.
In this work, we study the effect that this discrepancy of discount factors has during learning, and discover a family of objectives that interpolate value functions of two distinct discount factors.
- Score: 56.46324239692532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In practical reinforcement learning (RL), the discount factor used for
estimating value functions often differs from that used for defining the
evaluation objective. In this work, we study the effect that this discrepancy
of discount factors has during learning, and discover a family of objectives
that interpolate value functions of two distinct discount factors. Our analysis
suggests new ways for estimating value functions and performing policy
optimization updates, which demonstrate empirical performance gains. This
framework also leads to new insights on commonly-used deep RL heuristic
modifications to policy optimization algorithms.
Related papers
- Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning? [1.9116784879310031]
In deep Reinforcement Learning (RL), value functions are approximated using deep neural networks and trained via mean squared error regression objectives.
Recent research has proposed an alternative approach, utilizing the cross-entropy classification objective.
Our work seeks to empirically investigate the impact of such a replacement in an offline RL setup.
arXiv Detail & Related papers (2024-06-10T14:25:11Z) - Policy Gradient with Active Importance Sampling [55.112959067035916]
Policy gradient (PG) methods significantly benefit from IS, enabling the effective reuse of previously collected samples.
However, IS is employed in RL as a passive tool for re-weighting historical samples.
We look for the best behavioral policy from which to collect samples to reduce the policy gradient variance.
arXiv Detail & Related papers (2024-05-09T09:08:09Z) - Prediction and Control in Continual Reinforcement Learning [39.30411018922005]
Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful policies.
We propose to decompose the value function into two components which update at different timescales.
arXiv Detail & Related papers (2023-12-18T19:23:42Z) - Accelerating Policy Gradient by Estimating Value Function from Prior
Computation in Deep Reinforcement Learning [16.999444076456268]
We investigate the use of prior computation to estimate the value function to improve sample efficiency in on-policy policy gradient methods.
In particular, we learn a new value function for the target task while combining it with a value estimate from the prior.
The resulting value function is used as a baseline in the policy gradient method.
arXiv Detail & Related papers (2023-02-02T20:23:22Z) - Rectified Max-Value Entropy Search for Bayesian Optimization [54.26984662139516]
We develop a rectified MES acquisition function based on the notion of mutual information.
As a result, RMES shows a consistent improvement over MES in several synthetic function benchmarks and real-world optimization problems.
arXiv Detail & Related papers (2022-02-28T08:11:02Z) - A Generalized Bootstrap Target for Value-Learning, Efficiently Combining
Value and Feature Predictions [39.17511693008055]
Estimating value functions is a core component of reinforcement learning algorithms.
We focus on bootstrapping targets used when estimating value functions.
We propose a new backup target, the $eta$-return mixture.
arXiv Detail & Related papers (2022-01-05T21:54:55Z) - Unifying Gradient Estimators for Meta-Reinforcement Learning via
Off-Policy Evaluation [53.83642844626703]
We provide a unifying framework for estimating higher-order derivatives of value functions, based on off-policy evaluation.
Our framework interprets a number of prior approaches as special cases and elucidates the bias and variance trade-off of Hessian estimates.
arXiv Detail & Related papers (2021-06-24T15:58:01Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.