Learning Distinguishable Representations in Deep Q-Networks for Linear Transfer
- URL: http://arxiv.org/abs/2509.24947v1
- Date: Mon, 29 Sep 2025 15:44:35 GMT
- Title: Learning Distinguishable Representations in Deep Q-Networks for Linear Transfer
- Authors: Sooraj Sathish, Keshav Goyal, Raghuram Bharadwaj Diddigi,
- Abstract summary: We propose a novel deep Q-learning approach that introduces a regularization term to reduce positive correlations between feature representation of states.<n>We demonstrate the efficacy of our approach in improving transfer learning performance and thereby reducing computational overhead.
- Score: 0.9558392439655014
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Reinforcement Learning (RL) has demonstrated success in solving complex sequential decision-making problems by integrating neural networks with the RL framework. However, training deep RL models poses several challenges, such as the need for extensive hyperparameter tuning and high computational costs. Transfer learning has emerged as a promising strategy to address these challenges by enabling the reuse of knowledge from previously learned tasks for new, related tasks. This avoids the need for retraining models entirely from scratch. A commonly used approach for transfer learning in RL is to leverage the internal representations learned by the neural network during training. Specifically, the activations from the last hidden layer can be viewed as refined state representations that encapsulate the essential features of the input. In this work, we investigate whether these representations can be used as input for training simpler models, such as linear function approximators, on new tasks. We observe that the representations learned by standard deep RL models can be highly correlated, which limits their effectiveness when used with linear function approximation. To mitigate this problem, we propose a novel deep Q-learning approach that introduces a regularization term to reduce positive correlations between feature representation of states. By leveraging these reduced correlated features, we enable more effective use of linear function approximation in transfer learning. Through experiments and ablation studies on standard RL benchmarks and MinAtar games, we demonstrate the efficacy of our approach in improving transfer learning performance and thereby reducing computational overhead.
Related papers
- RL Grokking Recipe: How Does RL Unlock and Transfer New Algorithms in LLMs? [92.4931695205957]
We introduce DELTA-Code, a benchmark of synthetic coding problem families designed to probe two fundamental aspects: learnability and transferrability.<n>Our experiments reveal a striking grokking phase transition: after an extended period with near-zero reward, RL-trained models abruptly climb to near-perfect accuracy.<n>To enable learnability on previously unsolvable problem families, we explore key training ingredients such as staged warm-up with dense rewards, experience replay, curriculum training, and verification-in-the-loop.
arXiv Detail & Related papers (2025-09-25T11:20:56Z) - RL as Regressor: A Reinforcement Learning Approach for Function Approximation [0.0]
We propose framing regression as a Reinforcement Learning (RL) problem.<n>We demonstrate this by treating a model's prediction as an action and defining a custom reward signal based on the prediction error.<n>We show that the RL framework not only successfully solves the regression problem but also offers enhanced flexibility in defining objectives and guiding the learning process.
arXiv Detail & Related papers (2025-07-31T21:39:24Z) - Scaling Offline RL via Efficient and Expressive Shortcut Models [13.050231036248338]
offline reinforcement learning (RL) remains challenging due to the iterative nature of their noise sampling processes.<n>We introduce Scalable Offline Reinforcement Learning (SORL), a new offline RL algorithm that leverages shortcut models to scale both training and inference.<n>We demonstrate that SORL achieves strong performance across a range of offline RL tasks and exhibits positive scaling behavior with increased test-time compute.
arXiv Detail & Related papers (2025-05-28T20:59:22Z) - Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning [62.984693936073974]
Value-based reinforcement learning can learn effective policies for a wide range of multi-turn problems.<n>Current value-based RL methods have proven particularly challenging to scale to the setting of large language models.<n>We propose a novel offline RL algorithm that addresses these drawbacks, casting Q-learning as a modified supervised fine-tuning problem.
arXiv Detail & Related papers (2024-11-07T21:36:52Z) - N-Gram Induction Heads for In-Context RL: Improving Stability and Reducing Data Needs [39.84233556353338]
In-context learning allows models like transformers to adapt to new tasks without updating their weights.<n>In this work, we integrated the n-gram induction heads into transformers for in-context RL.<n>Our approach matches, and in some cases surpasses, the performance of Algorithm Distillation (AD) in both grid-world and pixel-based environments.
arXiv Detail & Related papers (2024-11-04T10:31:03Z) - Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning [17.714908233024847]
reinforcement learning (RL) agents learn to solve new tasks by updating their neural network parameters through interactions with the task environment.<n>Recent works demonstrate that some RL agents, after certain pretraining procedures, can learn to solve unseen new tasks without parameter updates.
arXiv Detail & Related papers (2024-05-22T17:38:16Z) - Stop Regressing: Training Value Functions via Classification for
Scalable Deep RL [109.44370201929246]
We show that training value functions with categorical cross-entropy improves performance and scalability in a variety of domains.
These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers.
arXiv Detail & Related papers (2024-03-06T18:55:47Z) - Online Target Q-learning with Reverse Experience Replay: Efficiently
finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results.
We present novel methods Q-Rex and Q-RexDaRe.
We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z) - POAR: Efficient Policy Optimization via Online Abstract State
Representation Learning [6.171331561029968]
State Representation Learning (SRL) is proposed to specifically learn to encode task-relevant features from complex sensory data into low-dimensional states.
We introduce a new SRL prior called domain resemblance to leverage expert demonstration to improve SRL interpretations.
We empirically verify POAR to efficiently handle tasks in high dimensions and facilitate training real-life robots directly from scratch.
arXiv Detail & Related papers (2021-09-17T16:52:03Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z) - Learning the Travelling Salesperson Problem Requires Rethinking
Generalization [9.176056742068813]
End-to-end training of neural network solvers for graph optimization problems such as the Travelling Salesperson Problem (TSP) have seen a surge of interest recently.
While state-of-the-art learning-driven approaches perform closely to classical solvers when trained on trivially small sizes, they are unable to generalize the learnt policy to larger instances at practical scales.
This work presents an end-to-end neural optimization pipeline that unifies several recent papers in order to identify the principled biases, model architectures and learning algorithms that promote generalization to instances larger than those seen in training.
arXiv Detail & Related papers (2020-06-12T10:14:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.