Related papers: Mitigating the Stability-Plasticity Dilemma in Adaptive Train Scheduling with Curriculum-Driven Continual DQN Expansion

Mitigating the Stability-Plasticity Dilemma in Adaptive Train Scheduling with Curriculum-Driven Continual DQN Expansion

URL: http://arxiv.org/abs/2408.09838v1
Date: Mon, 19 Aug 2024 09:33:31 GMT
Title: Mitigating the Stability-Plasticity Dilemma in Adaptive Train Scheduling with Curriculum-Driven Continual DQN Expansion
Authors: Achref Jaziri, Etienne Künzel, Visvanathan Ramesh,
Abstract summary: A continual learning agent builds on previous experiences to develop increasingly complex behaviors. However, scaling these systems presents significant challenges, particularly in balancing the preservation of previous policies with the adaptation of new ones to current environments. This balance, known as the stability-plasticity dilemma, is especially pronounced in complex multi-agent domains such as the train scheduling problem.
Score: 3.2635082758250693
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A continual learning agent builds on previous experiences to develop increasingly complex behaviors by adapting to non-stationary and dynamic environments while preserving previously acquired knowledge. However, scaling these systems presents significant challenges, particularly in balancing the preservation of previous policies with the adaptation of new ones to current environments. This balance, known as the stability-plasticity dilemma, is especially pronounced in complex multi-agent domains such as the train scheduling problem, where environmental and agent behaviors are constantly changing, and the search space is vast. In this work, we propose addressing these challenges in the train scheduling problem using curriculum learning. We design a curriculum with adjacent skills that build on each other to improve generalization performance. Introducing a curriculum with distinct tasks introduces non-stationarity, which we address by proposing a new algorithm: Continual Deep Q-Network (DQN) Expansion (CDE). Our approach dynamically generates and adjusts Q-function subspaces to handle environmental changes and task requirements. CDE mitigates catastrophic forgetting through EWC while ensuring high plasticity using adaptive rational activation functions. Experimental results demonstrate significant improvements in learning efficiency and adaptability compared to RL baselines and other adapted methods for continual learning, highlighting the potential of our method in managing the stability-plasticity dilemma in the adaptive train scheduling setting.

Related papers

Orthogonal Projection Subspace to Aggregate Online Prior-knowledge for Continual Test-time Adaptation [67.80294336559574]
Continual Test Time Adaptation (CTTA) is a task that requires a source pre-trained model to continually adapt to new scenarios.<n>We propose a novel pipeline, Orthogonal Projection Subspace to aggregate online Prior-knowledge, dubbed OoPk.
arXiv Detail & Related papers (2025-06-23T18:17:39Z)
Lifelong Learning with Task-Specific Adaptation: Addressing the Stability-Plasticity Dilemma [13.567823451714405]
Lifelong learning aims to continuously acquire new knowledge while retaining previously learned knowledge. The stability-plasticity dilemma requires models to balance the preservation of previous knowledge (stability) with the ability to learn new tasks (plasticity) This paper proposes AdaLL, an adapter-based framework designed to address the dilemma through a simple, universal, and effective strategy.
arXiv Detail & Related papers (2025-03-08T13:33:38Z)
Hierarchical Subspaces of Policies for Continual Offline Reinforcement Learning [19.463863037999054]
We consider a Continual Reinforcement Learning setup, where a learning agent must continuously adapt to new tasks while retaining previously acquired skill sets. We introduce HiSPO, a novel hierarchical framework designed specifically for continual learning in navigation settings from offline data. We demonstrate, through a careful experimental study, the effectiveness of our method in both classical MuJoCo maze environments and complex video game-like navigation simulations.
arXiv Detail & Related papers (2024-12-19T14:00:03Z)
Continual Task Learning through Adaptive Policy Self-Composition [54.95680427960524]
CompoFormer is a structure-based continual transformer model that adaptively composes previous policies via a meta-policy network. Our experiments reveal that CompoFormer outperforms conventional continual learning (CL) methods, particularly in longer task sequences.
arXiv Detail & Related papers (2024-11-18T08:20:21Z)
Dynamic Weight Adjusting Deep Q-Networks for Real-Time Environmental Adaptation [3.2162648244439684]
This study explores integrating dynamic weight adjustments into Deep Q-Networks (DQN) to enhance their adaptability. We implement these adjustments by modifying the sampling probabilities in the experience replay to make the model focus more on pivotal transitions. We design a novel Interactive Dynamic Evaluation Method (IDEM) for DQN that successfully navigates dynamic environments.
arXiv Detail & Related papers (2024-11-04T19:47:23Z)
Large Continual Instruction Assistant [59.585544987096974]
Continual Instruction Tuning (CIT) is adopted to instruct Large Models to follow human intent data by data.<n>Existing update gradient would heavily destroy the performance on previous datasets during CIT process.<n>We propose a general continual instruction tuning framework to address the challenge.
arXiv Detail & Related papers (2024-10-08T11:24:59Z)
Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement Learning [6.388725318524439]
Key challenge in lifelong reinforcement learning is the loss of plasticity. We present a parameter-free tuning for lifelong RL called TRAC. Experiments on Procgen, Atari, and Gym Control environments show TRAC works surprisingly well.
arXiv Detail & Related papers (2024-05-26T17:38:44Z)
HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments [93.94020724735199]
HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind. This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines.
arXiv Detail & Related papers (2024-01-23T18:59:43Z)
Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning [23.15206507040553]
We propose Auxiliary Network Continual Learning (ANCL) to equip the neural network with the ability to learn the current task. ANCL applies an additional auxiliary network which promotes plasticity to the continually learned model which mainly focuses on stability. More concretely, the proposed framework materializes in a regularizer that naturally interpolates between plasticity and stability.
arXiv Detail & Related papers (2023-03-16T17:00:42Z)
Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization [29.61829620717385]
Key challenge of continual reinforcement learning (CRL) in dynamic environments is to promptly adapt the RL agent's behavior as the environment changes over its lifetime. DaCoRL learns a context-conditioned policy using progressive contextualization. DaCoRL features consistent superiority over existing methods in terms of the stability, overall performance and generalization ability.
arXiv Detail & Related papers (2022-09-01T10:26:58Z)
Balancing Stability and Plasticity through Advanced Null Space in Continual Learning [77.94570903726856]
We propose a new continual learning approach, Advanced Null Space (AdNS), to balance the stability and plasticity without storing any old data of previous tasks. We also present a simple but effective method, intra-task distillation, to improve the performance of the current task. Experimental results show that the proposed method can achieve better performance compared to state-of-the-art continual learning approaches.
arXiv Detail & Related papers (2022-07-25T11:04:22Z)
Learning to Walk Autonomously via Reset-Free Quality-Diversity [73.08073762433376]
Quality-Diversity algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills. Existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions. This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments.
arXiv Detail & Related papers (2022-04-07T14:07:51Z)
Learning to Continuously Optimize Wireless Resource In Episodically Dynamic Environment [55.91291559442884]
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment. We propose to build the notion of continual learning into the modeling process of learning wireless systems. Our design is based on a novel min-max formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2020-11-16T08:24:34Z)
Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences. We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.