Related papers: Scaling Algorithm Distillation for Continuous Control with Mamba

Scaling Algorithm Distillation for Continuous Control with Mamba

URL: http://arxiv.org/abs/2506.13892v1
Date: Mon, 16 Jun 2025 18:15:02 GMT
Title: Scaling Algorithm Distillation for Continuous Control with Mamba
Authors: Samuel Beaussant, Mehdi Mounsif,
Abstract summary: Algorithm Distillation (AD) was recently proposed as a new approach to perform In-Context Reinforcement Learning (ICRL)<n>We show that scaling AD to very long contexts can improve ICRL performance and make it competitive even with a SOTA online meta RL baseline.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Algorithm Distillation (AD) was recently proposed as a new approach to perform In-Context Reinforcement Learning (ICRL) by modeling across-episodic training histories autoregressively with a causal transformer model. However, due to practical limitations induced by the attention mechanism, experiments were bottlenecked by the transformer's quadratic complexity and limited to simple discrete environments with short time horizons. In this work, we propose leveraging the recently proposed Selective Structured State Space Sequence (S6) models, which achieved state-of-the-art (SOTA) performance on long-range sequence modeling while scaling linearly in sequence length. Through four complex and continuous Meta Reinforcement Learning environments, we demonstrate the overall superiority of Mamba, a model built with S6 layers, over a transformer model for AD. Additionally, we show that scaling AD to very long contexts can improve ICRL performance and make it competitive even with a SOTA online meta RL baseline.

Related papers

Learning to Dissipate Energy in Oscillatory State-Space Models [55.09730499143998]
State-space models (SSMs) are a class of networks for sequence learning.<n>We show that D-LinOSS consistently outperforms previous LinOSS methods on long-range learning tasks.
arXiv Detail & Related papers (2025-05-17T23:15:17Z)
Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning [39.53836535326121]
We propose Distillation for In-Context Planning (DICP), an in-context model-based RL framework where Transformers simultaneously learn environment dynamics and improve policy in-context.<n>Our results show that DICP achieves state-of-the-art performance while requiring significantly fewer environment interactions than baselines.
arXiv Detail & Related papers (2025-02-26T10:16:57Z)
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.<n>Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.<n>We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z)
Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization [72.81319836138347]
Online continual learning (OCL) seeks to learn new tasks from data streams that appear only once, while retaining knowledge of previously learned tasks.<n>Most existing methods rely on replay, focusing on enhancing memory retention through regularization or distillation.<n>We introduce a plug-and-play module, S6MOD, which can be integrated into most existing methods and directly improve adaptability.
arXiv Detail & Related papers (2024-12-24T05:25:21Z)
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks [25.961645453318873]
We propose a Large Recurrent Action Model (LRAM) with an xLSTM at its core that comes with linear-time inference complexity and natural sequence length extrapolation abilities.<n>Experiments on 432 tasks from 6 domains show that LRAM compares favorably to Transformers in terms of performance and speed.
arXiv Detail & Related papers (2024-10-29T17:55:47Z)
Longhorn: State Space Models are Amortized Online Learners [51.10124201221601]
State-space models (SSMs) offer linear decoding efficiency while maintaining parallelism during training. In this work, we explore SSM design through the lens of online learning, conceptualizing SSMs as meta-modules for specific online learning problems. We introduce a novel deep SSM architecture, Longhorn, whose update resembles the closed-form solution for solving the online associative recall problem.
arXiv Detail & Related papers (2024-07-19T11:12:08Z)
Convolutional State Space Models for Long-Range Spatiotemporal Modeling [65.0993000439043]
ConvS5 is an efficient variant for long-rangetemporal modeling. It significantly outperforms Transformers and ConvNISTTM on a long horizon Moving-Lab experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers.
arXiv Detail & Related papers (2023-10-30T16:11:06Z)
Structured State Space Models for In-Context Reinforcement Learning [30.189834820419446]
Structured state space sequence (S4) models have recently achieved state-of-the-art performance on long-range sequence modeling tasks. We propose a modification to a variant of S4 that enables us to initialise and reset the hidden state in parallel. We show that our modified architecture runs faster than Transformers in sequence length and performs better than RNN's on a simple memory-based task.
arXiv Detail & Related papers (2023-03-07T15:32:18Z)
Online Decision Transformer [30.54774566089644]
offline reinforcement learning (RL) can be formulated as a sequence modeling problem. Online Decision Transformers (ODT) is an RL algorithm based on sequence modeling that blends offline pretraining with online finetuning.
arXiv Detail & Related papers (2022-02-11T13:43:24Z)
Reinforcement Learning as One Big Sequence Modeling Problem [84.84564880157149]
Reinforcement learning (RL) is typically concerned with estimating single-step policies or single-step models. We view RL as a sequence modeling problem, with the goal being to predict a sequence of actions that leads to a sequence of high rewards.
arXiv Detail & Related papers (2021-06-03T17:58:51Z)
Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation [91.05073136215886]
"Actor-Learner Distillation" transfers learning progress from a large capacity learner model to a small capacity actor model. We demonstrate in several challenging memory environments that using Actor-Learner Distillation recovers the clear sample-efficiency gains of the transformer learner model.
arXiv Detail & Related papers (2021-04-04T17:56:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.