Related papers: SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

URL: http://arxiv.org/abs/2405.15920v2
Date: Sun, 22 Sep 2024 22:49:54 GMT
Title: SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning
Authors: Shuai Zhang, Heshan Devaka Fernando, Miao Liu, Keerthiram Murugesan, Songtao Lu, Pin-Yu Chen, Tianyi Chen, Meng Wang,
Abstract summary: This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward mapping. We establish the first convergence analysis with provable generalization guarantees for SF-DQN with GPI.
Score: 89.04776523010409
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward mapping: the former characterizes the transition dynamics, and the latter characterizes the task-specific reward function. This Q-function decomposition, coupled with a policy improvement operator known as generalized policy improvement (GPI), reduces the sample complexity of finding the optimal Q-function, and thus the SF \& GPI framework exhibits promising empirical performance compared to traditional RL methods like Q-learning. However, its theoretical foundations remain largely unestablished, especially when learning the successor features using deep neural networks (SF-DQN). This paper studies the provable knowledge transfer using SFs-DQN in transfer RL problems. We establish the first convergence analysis with provable generalization guarantees for SF-DQN with GPI. The theory reveals that SF-DQN with GPI outperforms conventional RL approaches, such as deep Q-network, in terms of both faster convergence rate and better generalization. Numerical experiments on real and synthetic RL tasks support the superior performance of SF-DQN \& GPI, aligning with our theoretical findings.

Related papers

ANCRe: Adaptive Neural Connection Reassignment for Efficient Depth Scaling [57.91760520589592]
Scaling network depth has been a central driver behind the success of modern foundation models.<n>This paper revisits the default mechanism for deepening neural networks, namely residual connections.<n>We introduce adaptive neural connection reassignment (ANCRe), a principled and lightweight framework that parameterizes and learns residual connectivities from the data.
arXiv Detail & Related papers (2026-02-09T18:54:18Z)
Enhanced-FQL($λ$), an Efficient and Interpretable RL with novel Fuzzy Eligibility Traces and Segmented Experience Replay [0.0]
Enhanced-FQL($$) achieves superior sample efficiency and reduced variance compared to n-step fuzzy TD and fuzzyA($$) baselines.<n>The framework's inherent interpretability, combined with its computational efficiency and theoretical convergence guarantees, makes it suitable for safety-critical applications.
arXiv Detail & Related papers (2026-01-07T20:59:18Z)
Sample-Efficient Neurosymbolic Deep Reinforcement Learning [49.60927398960061]
We propose a neuro-symbolic Deep RL approach that integrates background symbolic knowledge to improve sample efficiency.<n>Online reasoning is performed to guide the training process through two mechanisms.<n>We show improved performance over a state-of-the-art reward machine baseline.
arXiv Detail & Related papers (2026-01-06T09:28:53Z)
Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models [71.9060068259379]
We propose cascaded domain-wise reinforcement learning to build general-purpose reasoning models.<n>Our 14B model, after RL, outperforms its SFT teacher, DeepSeek-R1-0528, on LiveCodeBench v5/v6 Pro and silver-medal performance in the 2025 International Olympiad in Informatics (IOI)
arXiv Detail & Related papers (2025-12-15T18:02:35Z)
QFFN-BERT: An Empirical Study of Depth, Performance, and Data Efficiency in Hybrid Quantum-Classical Transformers [4.309517184057254]
pediaized quantum circuits (PQCs) have emerged as promising components for enhancing the expressibility of neural architectures.<n>We introduce QFFN-BERT, a hybrid quantum-classical transformer where the feedforward network (FFN) modules of a compact BERT variant are replaced by PQC-based layers.
arXiv Detail & Related papers (2025-07-03T06:52:44Z)
Q-value Regularized Transformer for Offline Reinforcement Learning [70.13643741130899]
We propose a Q-value regularized Transformer (QT) to enhance the state-of-the-art in offline reinforcement learning (RL) QT learns an action-value function and integrates a term maximizing action-values into the training loss of Conditional Sequence Modeling (CSM) Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods.
arXiv Detail & Related papers (2024-05-27T12:12:39Z)
VQC-Based Reinforcement Learning with Data Re-uploading: Performance and Trainability [0.8192907805418583]
Reinforcement Learning (RL) consists of designing agents that make intelligent decisions without human supervision. Deep Q-Learning, a RL algorithm that uses Deep NNs, achieved super-human performance in some specific tasks. It is also possible to use Variational Quantum Circuits (VQCs) as function approximators in RL algorithms.
arXiv Detail & Related papers (2024-01-21T18:00:15Z)
Reinforcement Replaces Supervision: Query focused Summarization using Deep Reinforcement Learning [43.123290672073814]
We deal with systems that generate summaries from document(s) based on a query. Motivated by the insight that Reinforcement Learning (RL) provides a generalization to Supervised Learning (SL) for Natural Language Generation, we use an RL-based approach for this task. We develop multiple Policy Gradient networks, trained on various reward signals: ROUGE, BLEU, and Semantic Similarity.
arXiv Detail & Related papers (2023-11-29T10:38:16Z)
Efficient quantum recurrent reinforcement learning via quantum reservoir computing [3.6881738506505988]
Quantum reinforcement learning (QRL) has emerged as a framework to solve sequential decision-making tasks. This work presents a novel approach to address this challenge by constructing QRL agents utilizing QRNN-based quantum long short-term memory (QLSTM)
arXiv Detail & Related papers (2023-09-13T22:18:38Z)
Quantum deep Q learning with distributed prioritized experience replay [0.8702432681310399]
The framework incorporates prioritized experience replay and asynchronous training into the training algorithm to reduce the high sampling complexities. Numerical simulations demonstrate that QDQN-DPER outperforms the baseline distributed quantum Q learning with the same model architecture.
arXiv Detail & Related papers (2023-04-19T13:40:44Z)
Convergence Results For Q-Learning With Experience Replay [51.11953997546418]
We provide a convergence rate guarantee, and discuss how it compares to the convergence of Q-learning depending on important parameters such as the frequency and number of iterations of replay. We also provide theoretical evidence showing when we might expect this to strictly improve performance, by introducing and analyzing a simple class of MDPs.
arXiv Detail & Related papers (2021-12-08T10:22:49Z)
Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results. We present novel methods Q-Rex and Q-RexDaRe. We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z)
Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks. We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator. To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.