Temporally Extended Successor Representations
- URL: http://arxiv.org/abs/2209.12331v1
- Date: Sun, 25 Sep 2022 22:08:08 GMT
- Title: Temporally Extended Successor Representations
- Authors: Matthew J. Sargent, Peter J. Bentley, Caswell Barry, William de Cothi
- Abstract summary: We present a temporally extended variation of the successor representation, which we term t-SR.
t-SR captures the expected state transition dynamics of temporally extended actions by constructing successor representations over primitive action repeats.
We show that in environments with dynamic reward structure, t-SR is able to leverage both the flexibility of the successor representation and the abstraction afforded by temporally extended actions.
- Score: 0.9176056742068812
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a temporally extended variation of the successor representation,
which we term t-SR. t-SR captures the expected state transition dynamics of
temporally extended actions by constructing successor representations over
primitive action repeats. This form of temporal abstraction does not learn a
top-down hierarchy of pertinent task structures, but rather a bottom-up
composition of coupled actions and action repetitions. This lessens the amount
of decisions required in control without learning a hierarchical policy. As
such, t-SR directly considers the time horizon of temporally extended action
sequences without the need for predefined or domain-specific options. We show
that in environments with dynamic reward structure, t-SR is able to leverage
both the flexibility of the successor representation and the abstraction
afforded by temporally extended actions. Thus, in a series of sparsely rewarded
gridworld environments, t-SR optimally adapts learnt policies far faster than
comparable value-based, model-free reinforcement learning methods. We also show
that the manner in which t-SR learns to solve these tasks requires the learnt
policy to be sampled consistently less often than non-temporally extended
policies.
Related papers
- Zero-Shot Instruction Following in RL via Structured LTL Representations [50.41415009303967]
We study instruction following in multi-task reinforcement learning, where an agent must zero-shot execute novel tasks not seen during training.<n>In this setting, linear temporal logic has recently been adopted as a powerful framework for specifying structured, temporally extended tasks.<n>While existing approaches successfully train generalist policies, they often struggle to effectively capture the rich logical and temporal structure inherent in specifications.
arXiv Detail & Related papers (2026-02-15T23:22:50Z) - Hierarchical Successor Representation for Robust Transfer [10.635248457021495]
We propose the Hierarchical Successor Representation (HSR)<n>By incorporating temporal abstractions into the construction of predictive representations, HSR learns stable state features which are robust to task-induced policy changes.<n>We show that HSR's temporally extended predictive structure can also be leveraged to drive efficient exploration, effectively scaling to large, procedurally generated environments.
arXiv Detail & Related papers (2026-02-13T09:32:26Z) - RN-D: Discretized Categorical Actors with Regularized Networks for On-Policy Reinforcement Learning [27.45103393884625]
We revisit policy representation as a first-class design choice for on-policy optimization.<n>We study discretized categorical actors that represent each action dimension with a distribution over bins, yielding a policy objective that resembles a cross-entropy loss.<n>Our results show that simply replacing the standard actor network with our discretized regularized actor yields consistent gains.
arXiv Detail & Related papers (2026-01-30T15:24:34Z) - Learning Policy Representations for Steerable Behavior Synthesis [80.4542176039074]
Given a Markov decision process (MDP), we seek to learn representations for a range of policies to facilitate behavior steering at test time.<n>We show that these representations can be approximated uniformly for a range of policies using a set-based architecture.<n>We use variational generative approach to induce a smooth latent space, and further shape it with contrastive learning so that latent distances align with differences in value functions.
arXiv Detail & Related papers (2026-01-29T21:52:06Z) - Improving and Accelerating Offline RL in Large Discrete Action Spaces with Structured Policy Initialization [11.646124619395486]
Reinforcement learning in discrete action spaces requires searching over exponentially many joint actions to simultaneously select multiple sub-actions that form coherent combinations.<n>Existing approaches either simplify policy learning by assuming independence across sub-actions, or attempt to learn action structure and control jointly.<n>We introduce Structured Policy Initialization (SPIN), a two-stage framework that first pre-trains an Action Structure Model (ASM) to capture the manifold of valid actions, then freezes this representation and trains lightweight policy heads for control.
arXiv Detail & Related papers (2026-01-07T22:57:21Z) - ImplicitRDP: An End-to-End Visual-Force Diffusion Policy with Structural Slow-Fast Learning [52.86018040861575]
We propose a unified end-to-end visual-force diffusion policy that integrates visual planning and reactive force control within a single network.<n>We introduce Structural Slow-Fast Learning, a mechanism utilizing causal attention to simultaneously process asynchronous visual and force tokens.<n>Experiments on contact-rich tasks demonstrate that ImplicitRDP significantly outperforms both vision-only and hierarchical baselines.
arXiv Detail & Related papers (2025-12-11T18:59:46Z) - ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents [61.51091799997476]
We introduce ReCAP (Recursive Context-Aware Reasoning and Planning), a hierarchical framework with shared context for reasoning and planning in large language models (LLMs)<n>ReCAP combines three key mechanisms: plan-ahead decomposition, structured re-injection of parent plans, and memory-efficient execution.<n>Experiments demonstrate that ReCAP substantially improves subgoal alignment and success rates on various long-horizon reasoning benchmarks.
arXiv Detail & Related papers (2025-10-27T20:03:55Z) - DeCoP: Enhancing Self-Supervised Time Series Representation with Dependency Controlled Pre-training [39.30046923897652]
We propose a Dependency Controlled Pre-training framework that explicitly models dynamic, multi-scale dependencies by simulating evolving inter-patch dependencies.<n>DeCoP achieves state-of-the-art results on ten datasets with lower computing resources, improving MSE by 3% on ETTh1 over PatchTST using only 37% of the FLOPs.
arXiv Detail & Related papers (2025-09-18T05:44:06Z) - Structured Context Recomposition for Large Language Models Using Probabilistic Layer Realignment [0.0]
This paper introduces a probabilistic layer realignment strategy that dynamically adjusts learned representations within transformer layers.
It mitigates abrupt topic shifts and logical inconsistencies, particularly in scenarios where sequences exceed standard attention window constraints.
While SCR incurs a moderate increase in processing time, memory overhead remains within feasible limits, making it suitable for practical deployment in autoregressive generative applications.
arXiv Detail & Related papers (2025-01-29T12:46:42Z) - Continual Task Learning through Adaptive Policy Self-Composition [54.95680427960524]
CompoFormer is a structure-based continual transformer model that adaptively composes previous policies via a meta-policy network.
Our experiments reveal that CompoFormer outperforms conventional continual learning (CL) methods, particularly in longer task sequences.
arXiv Detail & Related papers (2024-11-18T08:20:21Z) - Hierarchical Orchestra of Policies [1.6574413179773757]
HOP dynamically forms a hierarchy of policies based on a similarity metric between the current observations and previously encountered observations in successful tasks.
HOP does not require task labelling, allowing for robust adaptation in environments where boundaries between tasks are ambiguous.
Our experiments, conducted across multiple tasks in a procedurally generated suite of environments, demonstrate that HOP significantly outperforms baseline methods in retaining knowledge across tasks.
arXiv Detail & Related papers (2024-11-05T11:13:09Z) - GMP-AR: Granularity Message Passing and Adaptive Reconciliation for Temporal Hierarchy Forecasting [20.56839345239421]
Time series forecasts of different temporal granularity are widely used in real-world applications.
We propose a novel granularity message-passing mechanism (GMP) that leverages temporal hierarchy information to improve forecasting performance.
We also introduce an optimization module to achieve task-based targets while adhering to more real-world constraints.
arXiv Detail & Related papers (2024-06-18T03:33:03Z) - Hierarchical Decomposition of Prompt-Based Continual Learning:
Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice.
HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics.
Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z) - A State Representation for Diminishing Rewards [20.945260614372327]
A common setting in multitask reinforcement learning (RL) demands that an agent rapidly adapt to various stationary reward functions randomly sampled from a fixed distribution.
In the natural world, sequential tasks are rarely independent, and instead reflect shifting priorities based on the availability and subjective perception of rewarding stimuli.
We introduce the $lambda$ representation ($lambda$R) which, surprisingly, is required for policy evaluation in this setting.
arXiv Detail & Related papers (2023-09-07T13:38:36Z) - Non-Stationary Bandits with Auto-Regressive Temporal Dependency [14.093856726745662]
This paper introduces a novel non-stationary MAB framework that captures the temporal structure of real-world dynamics through an auto-regressive (AR) reward structure.
We propose an algorithm that integrates two key mechanisms: (i) an alternation mechanism adept at leveraging temporal dependencies to dynamically balance exploration and exploitation, and (ii) a restarting mechanism designed to discard out-of-date information.
arXiv Detail & Related papers (2022-10-28T20:02:21Z) - Learning Sequence Representations by Non-local Recurrent Neural Memory [61.65105481899744]
We propose a Non-local Recurrent Neural Memory (NRNM) for supervised sequence representation learning.
Our model is able to capture long-range dependencies and latent high-level features can be distilled by our model.
Our model compares favorably against other state-of-the-art methods specifically designed for each of these sequence applications.
arXiv Detail & Related papers (2022-07-20T07:26:15Z) - Interpretable Time-series Representation Learning With Multi-Level
Disentanglement [56.38489708031278]
Disentangle Time Series (DTS) is a novel disentanglement enhancement framework for sequential data.
DTS generates hierarchical semantic concepts as the interpretable and disentangled representation of time-series.
DTS achieves superior performance in downstream applications, with high interpretability of semantic concepts.
arXiv Detail & Related papers (2021-05-17T22:02:24Z) - Temporal Context Aggregation Network for Temporal Action Proposal
Refinement [93.03730692520999]
Temporal action proposal generation is a challenging yet important task in the video understanding field.
Current methods still suffer from inaccurate temporal boundaries and inferior confidence used for retrieval.
We propose TCANet to generate high-quality action proposals through "local and global" temporal context aggregation.
arXiv Detail & Related papers (2021-03-24T12:34:49Z) - Deep Reinforcement Learning with Robust and Smooth Policy [90.78795857181727]
We propose to learn a smooth policy that behaves smoothly with respect to states.
We develop a new framework -- textbfSmooth textbfRegularized textbfReinforcement textbfLearning ($textbfSR2textbfL$), where the policy is trained with smoothness-inducing regularization.
Such regularization effectively constrains the search space, and enforces smoothness in the learned policy.
arXiv Detail & Related papers (2020-03-21T00:10:29Z) - Tree-Structured Policy based Progressive Reinforcement Learning for
Temporally Language Grounding in Video [128.08590291947544]
Temporally language grounding in untrimmed videos is a newly-raised task in video understanding.
Inspired by human's coarse-to-fine decision-making paradigm, we formulate a novel Tree-Structured Policy based Progressive Reinforcement Learning framework.
arXiv Detail & Related papers (2020-01-18T15:08:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.