Bridging the Gap Between Target Networks and Functional Regularization
- URL: http://arxiv.org/abs/2106.02613v4
- Date: Thu, 7 Sep 2023 15:50:30 GMT
- Title: Bridging the Gap Between Target Networks and Functional Regularization
- Authors: Alexandre Pich\'e, Valentin Thomas, Rafael Pardinas, Joseph Marino,
Gian Maria Marconi, Christopher Pal, Mohammad Emtiyaz Khan
- Abstract summary: We show that Target Networks act as an implicit regularizer which can be beneficial in some cases, but also have disadvantages.
We propose an explicit Functional Regularization alternative that is flexible and a convex regularizer in function space.
Our findings emphasize that Functional Regularization can be used as a drop-in replacement for Target Networks and result in performance improvement.
- Score: 61.051716530459586
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bootstrapping is behind much of the successes of deep Reinforcement Learning.
However, learning the value function via bootstrapping often leads to unstable
training due to fast-changing target values. Target Networks are employed to
stabilize training by using an additional set of lagging parameters to estimate
the target values. Despite the popularity of Target Networks, their effect on
the optimization is still misunderstood. In this work, we show that they act as
an implicit regularizer which can be beneficial in some cases, but also have
disadvantages such as being inflexible and can result in instabilities, even
when vanilla TD(0) converges. To overcome these issues, we propose an explicit
Functional Regularization alternative that is flexible and a convex regularizer
in function space and we theoretically study its convergence. We conduct an
experimental study across a range of environments, discount factors, and
off-policiness data collections to investigate the effectiveness of the
regularization induced by Target Networks and Functional Regularization in
terms of performance, accuracy, and stability. Our findings emphasize that
Functional Regularization can be used as a drop-in replacement for Target
Networks and result in performance improvement. Furthermore, adjusting both the
regularization weight and the network update period in Functional
Regularization can result in further performance improvements compared to
solely adjusting the network update period as typically done with Target
Networks. Our approach also enhances the ability to networks to recover
accurate $Q$-values.
Related papers
- Continual Learning via Sequential Function-Space Variational Inference [65.96686740015902]
We propose an objective derived by formulating continual learning as sequential function-space variational inference.
Compared to objectives that directly regularize neural network predictions, the proposed objective allows for more flexible variational distributions.
We demonstrate that, across a range of task sequences, neural networks trained via sequential function-space variational inference achieve better predictive accuracy than networks trained with related methods.
arXiv Detail & Related papers (2023-12-28T18:44:32Z) - On the Efficacy of Generalization Error Prediction Scoring Functions [33.24980750651318]
Generalization error predictors (GEPs) aim to predict model performance on unseen distributions by deriving dataset-level error estimates from sample-level scores.
We rigorously study the effectiveness of popular scoring functions (confidence, local manifold smoothness, model agreement) independent of mechanism choice.
arXiv Detail & Related papers (2023-03-23T18:08:44Z) - Why Target Networks Stabilise Temporal Difference Methods [38.35578010611503]
We show that under mild regularity conditions and a well tuned target network update frequency, convergence can be guaranteed.
We conclude that the use of target networks can mitigate the effects of poor conditioning in the Jacobian of the TD update.
arXiv Detail & Related papers (2023-02-24T09:46:00Z) - Bridging the Gap Between Target Networks and Functional Regularization [61.051716530459586]
We propose an explicit Functional Regularization that is a convex regularizer in function space and can easily be tuned.
We analyze the convergence of our method theoretically and empirically demonstrate that replacing Target Networks with the more theoretically grounded Functional Regularization approach leads to better sample efficiency and performance improvements.
arXiv Detail & Related papers (2022-10-21T22:27:07Z) - Rethinking Value Function Learning for Generalization in Reinforcement
Learning [11.516147824168732]
We focus on the problem of training RL agents on multiple training environments to improve observational generalization performance.
We identify that the value network in the multiple-environment setting is more challenging to optimize and prone to overfitting training data than in the conventional single-environment setting.
We propose Delayed-Critic Policy Gradient (DCPG), which implicitly penalizes the value estimates by optimizing the value network less frequently with more training data than the policy network.
arXiv Detail & Related papers (2022-10-18T16:17:47Z) - Breaking the Deadly Triad with a Target Network [80.82586530205776]
The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously.
We provide the first convergent linear $Q$-learning algorithms under nonrestrictive and changing behavior policies without bi-level optimization.
arXiv Detail & Related papers (2021-01-21T21:50:10Z) - Improve Generalization and Robustness of Neural Networks via Weight
Scale Shifting Invariant Regularizations [52.493315075385325]
We show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with homogeneous activation functions.
We propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network.
arXiv Detail & Related papers (2020-08-07T02:55:28Z) - Offline Contextual Bandits with Overparameterized Models [52.788628474552276]
We ask whether the same phenomenon occurs for offline contextual bandits.
We show that this discrepancy is due to the emphaction-stability of their objectives.
In experiments with large neural networks, this gap between action-stable value-based objectives and unstable policy-based objectives leads to significant performance differences.
arXiv Detail & Related papers (2020-06-27T13:52:07Z) - Exploiting the Full Capacity of Deep Neural Networks while Avoiding
Overfitting by Targeted Sparsity Regularization [1.3764085113103217]
Overfitting is one of the most common problems when training deep neural networks on comparatively small datasets.
We propose novel targeted sparsity visualization and regularization strategies to counteract overfitting.
arXiv Detail & Related papers (2020-02-21T11:38:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.