Bridging the Performance Gap Between Target-Free and Target-Based Reinforcement Learning
- URL: http://arxiv.org/abs/2506.04398v2
- Date: Sun, 28 Sep 2025 10:20:32 GMT
- Title: Bridging the Performance Gap Between Target-Free and Target-Based Reinforcement Learning
- Authors: Théo Vincent, Yogesh Tripathi, Tim Faust, Yaniv Oren, Jan Peters, Carlo D'Eramo,
- Abstract summary: We introduce a new method that uses a copy of the last linear layer of the online network as a target network.<n>We find that combining our approach with the concept of iterated Q-learning, which consists of learning consecutive Bellman updates in parallel, helps improve the sample-efficiency of target-free approaches.
- Score: 21.38951369323128
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The use of target networks in deep reinforcement learning is a widely popular solution to mitigate the brittleness of semi-gradient approaches and stabilize learning. However, target networks notoriously require additional memory and delay the propagation of Bellman updates compared to an ideal target-free approach. In this work, we step out of the binary choice between target-free and target-based algorithms. We introduce a new method that uses a copy of the last linear layer of the online network as a target network, while sharing the remaining parameters with the up-to-date online network. This simple modification enables us to keep the target-free's low-memory footprint while leveraging the target-based literature. We find that combining our approach with the concept of iterated Q-learning, which consists of learning consecutive Bellman updates in parallel, helps improve the sample-efficiency of target-free approaches. Our proposed method, iterated Shared Q-Learning (iS-QL), bridges the performance gap between target-free and target-based approaches across various problems, while using a single Q-network, thus being a step forward towards resource-efficient reinforcement learning algorithms.
Related papers
- Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning [22.6796319984868]
We introduce a novel update rule that computes the target using the MINimum estimate between the Target and Online network.<n> MINTO enables faster and stable value function learning, by mitigating the potential overestimation bias of using the online network for bootstrapping.<n>We evaluate MINTO extensively across diverse benchmarks, spanning online and offline RL, as well as discrete and continuous action spaces.
arXiv Detail & Related papers (2025-10-02T21:48:01Z) - Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning [57.3885832382455]
We show that introducing static network sparsity alone can unlock further scaling potential beyond dense counterparts with state-of-the-art architectures.<n>Our analysis reveals that, in contrast to naively scaling up dense DRL networks, such sparse networks achieve both higher parameter efficiency for network expressivity.
arXiv Detail & Related papers (2025-06-20T17:54:24Z) - Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks.
We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations.
We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z) - Why Target Networks Stabilise Temporal Difference Methods [38.35578010611503]
We show that under mild regularity conditions and a well tuned target network update frequency, convergence can be guaranteed.
We conclude that the use of target networks can mitigate the effects of poor conditioning in the Jacobian of the TD update.
arXiv Detail & Related papers (2023-02-24T09:46:00Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Bridging the Gap Between Target Networks and Functional Regularization [61.051716530459586]
We propose an explicit Functional Regularization that is a convex regularizer in function space and can easily be tuned.
We analyze the convergence of our method theoretically and empirically demonstrate that replacing Target Networks with the more theoretically grounded Functional Regularization approach leads to better sample efficiency and performance improvements.
arXiv Detail & Related papers (2022-10-21T22:27:07Z) - Continual Learning with Dependency Preserving Hypernetworks [14.102057320661427]
An effective approach to address continual learning (CL) problems is to use hypernetworks which generate task dependent weights for a target network.
We propose a novel approach that uses a dependency preserving hypernetwork to generate weights for the target network while also maintaining the parameter efficiency.
In addition, we propose novel regularisation and network growth techniques for the RNN based hypernetwork to further improve the continual learning performance.
arXiv Detail & Related papers (2022-09-16T04:42:21Z) - Generative multitask learning mitigates target-causing confounding [61.21582323566118]
We propose a simple and scalable approach to causal representation learning for multitask learning.
The improvement comes from mitigating unobserved confounders that cause the targets, but not the input.
Our results on the Attributes of People and Taskonomy datasets reflect the conceptual improvement in robustness to prior probability shift.
arXiv Detail & Related papers (2022-02-08T20:42:14Z) - C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation.
We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states.
E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z) - Cascaded Compressed Sensing Networks: A Reversible Architecture for
Layerwise Learning [11.721183551822097]
We show that target propagation could be achieved by modeling the network s each layer with compressed sensing, without the need of auxiliary networks.
Experiments show that the proposed method could achieve better performance than the auxiliary network-based method.
arXiv Detail & Related papers (2021-10-20T05:21:13Z) - Bridging the Gap Between Target Networks and Functional Regularization [61.051716530459586]
We show that Target Networks act as an implicit regularizer which can be beneficial in some cases, but also have disadvantages.
We propose an explicit Functional Regularization alternative that is flexible and a convex regularizer in function space.
Our findings emphasize that Functional Regularization can be used as a drop-in replacement for Target Networks and result in performance improvement.
arXiv Detail & Related papers (2021-06-04T17:21:07Z) - All at Once Network Quantization via Collaborative Knowledge Transfer [56.95849086170461]
We develop a novel collaborative knowledge transfer approach for efficiently training the all-at-once quantization network.
Specifically, we propose an adaptive selection strategy to choose a high-precision enquoteteacher for transferring knowledge to the low-precision student.
To effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network.
arXiv Detail & Related papers (2021-03-02T03:09:03Z) - Decoupled and Memory-Reinforced Networks: Towards Effective Feature
Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network.
There are two major challenges in the current one-step approaches.
We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z) - MetaGater: Fast Learning of Conditional Channel Gated Networks via
Federated Meta-Learning [46.79356071007187]
We propose a holistic approach to jointly train the backbone network and the channel gating.
We develop a federated meta-learning approach to jointly learn good meta-initializations for both backbone networks and gating modules.
arXiv Detail & Related papers (2020-11-25T04:26:23Z) - Meta-Learning with Network Pruning [40.07436648243748]
We propose a network pruning based meta-learning approach for overfitting reduction via explicitly controlling the capacity of network.
We have implemented our approach on top of Reptile assembled with two network pruning routines: Dense-Sparse-Dense (DSD) and Iterative Hard Thresholding (IHT)
arXiv Detail & Related papers (2020-07-07T06:13:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.