Hierarchical Successor Representation for Robust Transfer
- URL: http://arxiv.org/abs/2602.12753v1
- Date: Fri, 13 Feb 2026 09:32:26 GMT
- Title: Hierarchical Successor Representation for Robust Transfer
- Authors: Changmin Yu, Máté Lengyel,
- Abstract summary: We propose the Hierarchical Successor Representation (HSR)<n>By incorporating temporal abstractions into the construction of predictive representations, HSR learns stable state features which are robust to task-induced policy changes.<n>We show that HSR's temporally extended predictive structure can also be leveraged to drive efficient exploration, effectively scaling to large, procedurally generated environments.
- Score: 10.635248457021495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The successor representation (SR) provides a powerful framework for decoupling predictive dynamics from rewards, enabling rapid generalisation across reward configurations. However, the classical SR is limited by its inherent policy dependence: policies change due to ongoing learning, environmental non-stationarities, and changes in task demands, making established predictive representations obsolete. Furthermore, in topologically complex environments, SRs suffer from spectral diffusion, leading to dense and overlapping features that scale poorly. Here we propose the Hierarchical Successor Representation (HSR) for overcoming these limitations. By incorporating temporal abstractions into the construction of predictive representations, HSR learns stable state features which are robust to task-induced policy changes. Applying non-negative matrix factorisation (NMF) to the HSR yields a sparse, low-rank state representation that facilitates highly sample-efficient transfer to novel tasks in multi-compartmental environments. Further analysis reveals that HSR-NMF discovers interpretable topological structures, providing a policy-agnostic hierarchical map that effectively bridges model-free optimality and model-based flexibility. Beyond providing a useful basis for task-transfer, we show that HSR's temporally extended predictive structure can also be leveraged to drive efficient exploration, effectively scaling to large, procedurally generated environments.
Related papers
- StepVAR: Structure-Texture Guided Pruning for Visual Autoregressive Models [98.72926158261937]
We propose a training-free token pruning framework for Visual AutoRegressive models.<n>We employ a lightweight high-pass filter to capture local texture details, while leveraging Principal Component Analysis (PCA) to preserve global structural information.<n>To maintain valid next-scale prediction under sparse tokens, we introduce a nearest neighbor feature propagation strategy.
arXiv Detail & Related papers (2026-03-02T11:35:05Z) - Roughness-Informed Federated Learning [3.8218584696400484]
Federated Learning (FL) enables collaborative model training across distributed clients.<n>FL faces challenges in non-independent and identically distributed (non-IID) settings due to client drift.<n>We propose RI-FedAvg, a novel FL that mitigates client drift by incorporating a Roughness Index (RI)-based regularization term.
arXiv Detail & Related papers (2026-02-11T07:35:45Z) - Bidirectional Reward-Guided Diffusion for Real-World Image Super-Resolution [79.35296000454694]
Diffusion-based super-resolution can synthesize rich details, but models trained on synthetic paired data often fail on real-world LR images.<n>We propose Bird-SR, a reward-guided diffusion framework that formulates super-resolution as trajectory-level preference optimization.<n>Experiments on real-world SR benchmarks demonstrate that Bird-SR consistently outperforms state-of-the-art methods in perceptual quality.
arXiv Detail & Related papers (2026-02-05T19:21:45Z) - MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization [56.074760766965085]
Group-Relative Policy Optimization has emerged as an efficient paradigm for aligning Large Language Models (LLMs)<n>We propose MAESTRO, which treats reward scalarization as a dynamic latent policy, leveraging the model's terminal hidden states as a semantic bottleneck.<n>We formulate this as a contextual bandit problem within a bi-level optimization framework, where a lightweight Conductor network co-evolves with the policy by utilizing group-relative advantages as a meta-reward signal.
arXiv Detail & Related papers (2026-01-12T05:02:48Z) - Sample-Efficient Neurosymbolic Deep Reinforcement Learning [49.60927398960061]
We propose a neuro-symbolic Deep RL approach that integrates background symbolic knowledge to improve sample efficiency.<n>Online reasoning is performed to guide the training process through two mechanisms.<n>We show improved performance over a state-of-the-art reward machine baseline.
arXiv Detail & Related papers (2026-01-06T09:28:53Z) - Topology-Assisted Spatio-Temporal Pattern Disentangling for Scalable MARL in Large-scale Autonomous Traffic Control [14.929720580977152]
This paper introduces a novel MARL framework that integrates Dynamic Graph Neural Networks (DGNNs) and Topological Data Analysis (TDA)<n>Inspired by the Mixture of Experts (MoE) architecture in Large Language Models (LLMs), a topology-assisted spatial pattern disentangling (TSD)-enhanced MoE is proposed.<n> Extensive experiments conducted on real-world traffic scenarios, together with comprehensive theoretical analysis, validate the superior performance of the proposed framework.
arXiv Detail & Related papers (2025-06-14T11:18:12Z) - High-Fidelity Scientific Simulation Surrogates via Adaptive Implicit Neural Representations [51.90920900332569]
Implicit neural representations (INRs) offer a compact and continuous framework for modeling spatially structured data.<n>Recent approaches address this by introducing additional features along rigid geometric structures.<n>We propose a simple yet effective alternative: Feature-Adaptive INR (FA-INR)
arXiv Detail & Related papers (2025-06-07T16:45:17Z) - Structured Context Recomposition for Large Language Models Using Probabilistic Layer Realignment [0.0]
This paper introduces a probabilistic layer realignment strategy that dynamically adjusts learned representations within transformer layers.<n>It mitigates abrupt topic shifts and logical inconsistencies, particularly in scenarios where sequences exceed standard attention window constraints.<n>While SCR incurs a moderate increase in processing time, memory overhead remains within feasible limits, making it suitable for practical deployment in autoregressive generative applications.
arXiv Detail & Related papers (2025-01-29T12:46:42Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - ASR: Attention-alike Structural Re-parameterization [53.019657810468026]
We propose a simple-yet-effective attention-alike structural re- parameterization (ASR) that allows us to achieve SRP for a given network while enjoying the effectiveness of the attention mechanism.
In this paper, we conduct extensive experiments from a statistical perspective and discover an interesting phenomenon Stripe Observation, which reveals that channel attention values quickly approach some constant vectors during training.
arXiv Detail & Related papers (2023-04-13T08:52:34Z) - Temporally Extended Successor Representations [0.9176056742068812]
We present a temporally extended variation of the successor representation, which we term t-SR.
t-SR captures the expected state transition dynamics of temporally extended actions by constructing successor representations over primitive action repeats.
We show that in environments with dynamic reward structure, t-SR is able to leverage both the flexibility of the successor representation and the abstraction afforded by temporally extended actions.
arXiv Detail & Related papers (2022-09-25T22:08:08Z) - Action-Sufficient State Representation Learning for Control with
Structural Constraints [21.47086290736692]
In this paper, we focus on partially observable environments and propose to learn a minimal set of state representations that capture sufficient information for decision-making.
We build a generative environment model for the structural relationships among variables in the system and present a principled way to characterize ASRs.
Our empirical results on CarRacing and VizDoom demonstrate a clear advantage of learning and using ASRs for policy learning.
arXiv Detail & Related papers (2021-10-12T03:16:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.