Related papers: Bridging the Gap Between Target Networks and Functional Regularization

Bridging the Gap Between Target Networks and Functional Regularization

URL: http://arxiv.org/abs/2106.02613v4
Date: Thu, 7 Sep 2023 15:50:30 GMT
Title: Bridging the Gap Between Target Networks and Functional Regularization
Authors: Alexandre Pich\'e, Valentin Thomas, Rafael Pardinas, Joseph Marino, Gian Maria Marconi, Christopher Pal, Mohammad Emtiyaz Khan
Abstract summary: We show that Target Networks act as an implicit regularizer which can be beneficial in some cases, but also have disadvantages. We propose an explicit Functional Regularization alternative that is flexible and a convex regularizer in function space. Our findings emphasize that Functional Regularization can be used as a drop-in replacement for Target Networks and result in performance improvement.
Score: 61.051716530459586
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Bootstrapping is behind much of the successes of deep Reinforcement Learning. However, learning the value function via bootstrapping often leads to unstable training due to fast-changing target values. Target Networks are employed to stabilize training by using an additional set of lagging parameters to estimate the target values. Despite the popularity of Target Networks, their effect on the optimization is still misunderstood. In this work, we show that they act as an implicit regularizer which can be beneficial in some cases, but also have disadvantages such as being inflexible and can result in instabilities, even when vanilla TD(0) converges. To overcome these issues, we propose an explicit Functional Regularization alternative that is flexible and a convex regularizer in function space and we theoretically study its convergence. We conduct an experimental study across a range of environments, discount factors, and off-policiness data collections to investigate the effectiveness of the regularization induced by Target Networks and Functional Regularization in terms of performance, accuracy, and stability. Our findings emphasize that Functional Regularization can be used as a drop-in replacement for Target Networks and result in performance improvement. Furthermore, adjusting both the regularization weight and the network update period in Functional Regularization can result in further performance improvements compared to solely adjusting the network update period as typically done with Target Networks. Our approach also enhances the ability to networks to recover accurate $Q$-values.

Related papers

Bridging the Performance Gap Between Target-Free and Target-Based Reinforcement Learning With Iterated Q-Learning [16.37956160356348]
In value-based reinforcement learning, removing the target network is tempting as the boostrapped target would be built from up-to-date estimates.<n>We propose to use a copy of the last linear layer of the online network as a target network, while sharing the remaining parameters with the up-to-date online network.<n>It enables us to leverage the concept of iterated Q-learning, which consists of learning consecutive Bellman iterations in parallel.
arXiv Detail & Related papers (2025-06-04T19:27:29Z)
Continual Learning via Sequential Function-Space Variational Inference [65.96686740015902]
We propose an objective derived by formulating continual learning as sequential function-space variational inference. Compared to objectives that directly regularize neural network predictions, the proposed objective allows for more flexible variational distributions. We demonstrate that, across a range of task sequences, neural networks trained via sequential function-space variational inference achieve better predictive accuracy than networks trained with related methods.
arXiv Detail & Related papers (2023-12-28T18:44:32Z)
On the Efficacy of Generalization Error Prediction Scoring Functions [33.24980750651318]
Generalization error predictors (GEPs) aim to predict model performance on unseen distributions by deriving dataset-level error estimates from sample-level scores. We rigorously study the effectiveness of popular scoring functions (confidence, local manifold smoothness, model agreement) independent of mechanism choice.
arXiv Detail & Related papers (2023-03-23T18:08:44Z)
Why Target Networks Stabilise Temporal Difference Methods [38.35578010611503]
We show that under mild regularity conditions and a well tuned target network update frequency, convergence can be guaranteed. We conclude that the use of target networks can mitigate the effects of poor conditioning in the Jacobian of the TD update.
arXiv Detail & Related papers (2023-02-24T09:46:00Z)
Bridging the Gap Between Target Networks and Functional Regularization [61.051716530459586]
We propose an explicit Functional Regularization that is a convex regularizer in function space and can easily be tuned. We analyze the convergence of our method theoretically and empirically demonstrate that replacing Target Networks with the more theoretically grounded Functional Regularization approach leads to better sample efficiency and performance improvements.
arXiv Detail & Related papers (2022-10-21T22:27:07Z)
Rethinking Value Function Learning for Generalization in Reinforcement Learning [11.516147824168732]
We focus on the problem of training RL agents on multiple training environments to improve observational generalization performance. We identify that the value network in the multiple-environment setting is more challenging to optimize and prone to overfitting training data than in the conventional single-environment setting. We propose Delayed-Critic Policy Gradient (DCPG), which implicitly penalizes the value estimates by optimizing the value network less frequently with more training data than the policy network.
arXiv Detail & Related papers (2022-10-18T16:17:47Z)
Breaking the Deadly Triad with a Target Network [80.82586530205776]
The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously. We provide the first convergent linear $Q$-learning algorithms under nonrestrictive and changing behavior policies without bi-level optimization.
arXiv Detail & Related papers (2021-01-21T21:50:10Z)
Improve Generalization and Robustness of Neural Networks via Weight Scale Shifting Invariant Regularizations [52.493315075385325]
We show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with homogeneous activation functions. We propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network.
arXiv Detail & Related papers (2020-08-07T02:55:28Z)
Offline Contextual Bandits with Overparameterized Models [52.788628474552276]
We ask whether the same phenomenon occurs for offline contextual bandits. We show that this discrepancy is due to the emphaction-stability of their objectives. In experiments with large neural networks, this gap between action-stable value-based objectives and unstable policy-based objectives leads to significant performance differences.
arXiv Detail & Related papers (2020-06-27T13:52:07Z)
Exploiting the Full Capacity of Deep Neural Networks while Avoiding Overfitting by Targeted Sparsity Regularization [1.3764085113103217]
Overfitting is one of the most common problems when training deep neural networks on comparatively small datasets. We propose novel targeted sparsity visualization and regularization strategies to counteract overfitting.
arXiv Detail & Related papers (2020-02-21T11:38:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.