Self-Composing Policies for Scalable Continual Reinforcement Learning
- URL: http://arxiv.org/abs/2506.14811v1
- Date: Wed, 04 Jun 2025 15:12:15 GMT
- Title: Self-Composing Policies for Scalable Continual Reinforcement Learning
- Authors: Mikel Malagón, Josu Ceberio, Jose A. Lozano,
- Abstract summary: This work introduces a growable and modular neural network architecture that naturally avoids catastrophic forgetting and interference in continual reinforcement learning.<n>The structure of each module allows the selective combination of previous policies along with its internal policy, accelerating the learning process on the current task.
- Score: 0.5461938536945723
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This work introduces a growable and modular neural network architecture that naturally avoids catastrophic forgetting and interference in continual reinforcement learning. The structure of each module allows the selective combination of previous policies along with its internal policy, accelerating the learning process on the current task. Unlike previous growing neural network approaches, we show that the number of parameters of the proposed approach grows linearly with respect to the number of tasks, and does not sacrifice plasticity to scale. Experiments conducted in benchmark continuous control and visual problems reveal that the proposed approach achieves greater knowledge transfer and performance than alternative methods.
Related papers
- Learning safe, constrained policies via imitation learning: Connection to Probabilistic Inference and a Naive Algorithm [0.22099217573031676]
This article introduces an imitation learning method for learning maximum entropy policies that comply with constraints demonstrated by expert executing a task.<n> Experiments show that the method can learn effective policy models for constraints-abiding behaviour, in settings with multiple constraints of different types, and with abilities to generalize.
arXiv Detail & Related papers (2025-07-09T12:11:27Z) - Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction [55.914891182214475]
We introduce neural network reprogrammability as a unifying framework for model adaptation.<n>We present a taxonomy that categorizes such information manipulation approaches across four key dimensions.<n>We also analyze remaining technical challenges and ethical considerations.
arXiv Detail & Related papers (2025-06-05T05:42:27Z) - Certified Neural Approximations of Nonlinear Dynamics [52.79163248326912]
In safety-critical contexts, the use of neural approximations requires formal bounds on their closeness to the underlying system.<n>We propose a novel, adaptive, and parallelizable verification method based on certified first-order models.
arXiv Detail & Related papers (2025-05-21T13:22:20Z) - Self-Controlled Dynamic Expansion Model for Continual Learning [10.447232167638816]
This paper introduces an innovative Self-Controlled Dynamic Expansion Model (SCDEM)<n>SCDEM orchestrates multiple trainable pre-trained ViT backbones to furnish diverse and semantically enriched representations.<n>An extensive series of experiments have been conducted to evaluate the proposed methodology's efficacy.
arXiv Detail & Related papers (2025-04-14T15:22:51Z) - Continual Learning via Sequential Function-Space Variational Inference [65.96686740015902]
We propose an objective derived by formulating continual learning as sequential function-space variational inference.
Compared to objectives that directly regularize neural network predictions, the proposed objective allows for more flexible variational distributions.
We demonstrate that, across a range of task sequences, neural networks trained via sequential function-space variational inference achieve better predictive accuracy than networks trained with related methods.
arXiv Detail & Related papers (2023-12-28T18:44:32Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning [2.1408617023874443]
We propose a novel multi-objective reinforcement learning (MORL) algorithm that trains a single neural network via policy gradient.
The proposed method works in both continuous and discrete action spaces with no design change of the policy network.
arXiv Detail & Related papers (2023-03-15T20:07:48Z) - Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step
Q-learning: A Novel Correction Approach [0.0]
We introduce a novel policy similarity measure to mitigate the effects of such discrepancy in continuous control.
Our method offers an adequate single-step off-policy correction that is applicable to deterministic policy networks.
arXiv Detail & Related papers (2022-08-01T11:33:12Z) - Continual Learning Beyond a Single Model [28.130513524601145]
We show that employing ensemble models can be a simple yet effective method to improve continual performance.
We propose a computationally cheap algorithm with similar runtime to a single model yet enjoying the performance benefits of ensembles.
arXiv Detail & Related papers (2022-02-20T14:30:39Z) - Verified Probabilistic Policies for Deep Reinforcement Learning [6.85316573653194]
We tackle the problem of verifying probabilistic policies for deep reinforcement learning.
We propose an abstraction approach, based on interval Markov decision processes, that yields guarantees on a policy's execution.
We present techniques to build and solve these models using abstract interpretation, mixed-integer linear programming, entropy-based refinement and probabilistic model checking.
arXiv Detail & Related papers (2022-01-10T23:55:04Z) - Behavior Priors for Efficient Reinforcement Learning [97.81587970962232]
We consider how information and architectural constraints can be combined with ideas from the probabilistic modeling literature to learn behavior priors.
We discuss how such latent variable formulations connect to related work on hierarchical reinforcement learning (HRL) and mutual information and curiosity based objectives.
We demonstrate the effectiveness of our framework by applying it to a range of simulated continuous control domains.
arXiv Detail & Related papers (2020-10-27T13:17:18Z) - Developing Constrained Neural Units Over Time [81.19349325749037]
This paper focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches.
The structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data.
The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner.
arXiv Detail & Related papers (2020-09-01T09:07:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.