Demonstration-Guided Continual Reinforcement Learning in Dynamic Environments
- URL: http://arxiv.org/abs/2512.18670v1
- Date: Sun, 21 Dec 2025 10:13:21 GMT
- Title: Demonstration-Guided Continual Reinforcement Learning in Dynamic Environments
- Authors: Xue Yang, Michael Schukat, Junlin Lu, Patrick Mannion, Karl Mason, Enda Howley,
- Abstract summary: Reinforcement learning (RL) excels in various applications but struggles in dynamic environments where the underlying Markov decision process evolves.<n>We propose demonstration-guided continual reinforcement learning (DGCRL), which stores prior knowledge in an external, self-evolving demonstration repository.<n>Experiments on 2D navigation and MuJoCo locomotion tasks demonstrate its superior average performance, enhanced knowledge transfer, mitigation of forgetting, and training efficiency.
- Score: 8.818727691237656
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) excels in various applications but struggles in dynamic environments where the underlying Markov decision process evolves. Continual reinforcement learning (CRL) enables RL agents to continually learn and adapt to new tasks, but balancing stability (preserving prior knowledge) and plasticity (acquiring new knowledge) remains challenging. Existing methods primarily address the stability-plasticity dilemma through mechanisms where past knowledge influences optimization but rarely affects the agent's behavior directly, which may hinder effective knowledge reuse and efficient learning. In contrast, we propose demonstration-guided continual reinforcement learning (DGCRL), which stores prior knowledge in an external, self-evolving demonstration repository that directly guides RL exploration and adaptation. For each task, the agent dynamically selects the most relevant demonstration and follows a curriculum-based strategy to accelerate learning, gradually shifting from demonstration-guided exploration to fully self-exploration. Extensive experiments on 2D navigation and MuJoCo locomotion tasks demonstrate its superior average performance, enhanced knowledge transfer, mitigation of forgetting, and training efficiency. The additional sensitivity analysis and ablation study further validate its effectiveness.
Related papers
- Continual Policy Distillation from Distributed Reinforcement Learning Teachers [14.879372764916154]
Continual Reinforcement Learning aims to develop lifelong learning agents to continuously acquire knowledge across diverse tasks.<n>This requires efficiently managing the stability-plasticity dilemma and leveraging prior experience to rapidly generalize to novel tasks.<n>We propose a novel teacher-student framework that decouples CRL into two independent processes: training single-task teacher models through distributed RL and continually distilling them into a central generalist model.
arXiv Detail & Related papers (2026-01-30T02:40:34Z) - Continual Knowledge Adaptation for Reinforcement Learning [37.4253231932861]
Reinforcement Learning enables agents to learn optimal behaviors through interactions with environments.<n>We propose Continual Knowledge Adaptation for Reinforcement Learning (CKA-RL), which enables the accumulation and effective utilization of historical knowledge.<n> Experiments on three benchmarks demonstrate that the proposed CKA-RL outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-10-22T07:25:41Z) - Decomposing the Entropy-Performance Exchange: The Missing Keys to Unlocking Effective Reinforcement Learning [106.68304931854038]
Reinforcement learning with verifiable rewards (RLVR) has been widely used for enhancing the reasoning abilities of large language models (LLMs)<n>We conduct a systematic empirical analysis of the entropy-performance exchange mechanism of RLVR across different levels of granularity.<n>Our analysis reveals that, in the rising stage, entropy reduction in negative samples facilitates the learning of effective reasoning patterns.<n>In the plateau stage, learning efficiency strongly correlates with high-entropy tokens present in low-perplexity samples and those located at the end of sequences.
arXiv Detail & Related papers (2025-08-04T10:08:10Z) - R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning [83.256752220849]
Large Language Models (LLMs) are powerful but prone to hallucinations due to static knowledge.<n>We introduce R1-Searcher++, a framework designed to train LLMs to adaptively leverage both internal and external knowledge sources.<n>Our experiments demonstrate that R1-Searcher++ outperforms previous RAG and reasoning methods and achieves efficient retrieval.
arXiv Detail & Related papers (2025-05-22T17:58:26Z) - DSMentor: Enhancing Data Science Agents with Curriculum Learning and Online Knowledge Accumulation [59.79833777420334]
Large language model (LLM) agents have shown promising performance in generating code for solving complex data science problems.<n>We develop a novel inference-time optimization framework, referred to as DSMentor, to enhance LLM agent performance.<n>Our work underscores the importance of developing effective strategies for accumulating and utilizing knowledge during inference.
arXiv Detail & Related papers (2025-05-20T10:16:21Z) - A Method for Fast Autonomy Transfer in Reinforcement Learning [3.8049020806504967]
This paper introduces a novel reinforcement learning (RL) strategy designed to facilitate rapid autonomy transfer.
Unlike traditional methods that require extensive retraining or fine-tuning, our approach integrates existing knowledge, enabling an RL agent to adapt swiftly to new settings.
arXiv Detail & Related papers (2024-07-29T23:48:07Z) - Physics-informed Imitative Reinforcement Learning for Real-world Driving [17.263297015508705]
We propose a physics-informed imitative reinforcement learning (IRL) that is entirely data-driven.<n>Our approach exhibits 37.8% reduction in collision rate and 22.2% reduction in off-road rate compared to the baseline method.
arXiv Detail & Related papers (2024-06-18T14:27:14Z) - RILe: Reinforced Imitation Learning [60.63173816209543]
RILe (Reinforced Learning) is a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward function efficiently.<n>Our framework produces high-performing policies in high-dimensional tasks where direct imitation fails to replicate complex behaviors.
arXiv Detail & Related papers (2024-06-12T17:56:31Z) - Variance-Covariance Regularization Improves Representation Learning [28.341622247252705]
We adapt a self-supervised learning regularization technique to supervised learning contexts, introducing Variance-Covariance Regularization (VCReg)
We demonstrate that VCReg significantly enhances transfer learning for images and videos, achieving state-of-the-art performance across numerous tasks and datasets.
In summary, VCReg offers a universally applicable regularization framework that significantly advances transfer learning and highlights the connection between gradient starvation, neural collapse, and feature transferability.
arXiv Detail & Related papers (2023-06-23T05:01:02Z) - Flexible Attention-Based Multi-Policy Fusion for Efficient Deep
Reinforcement Learning [78.31888150539258]
Reinforcement learning (RL) agents have long sought to approach the efficiency of human learning.
Prior studies in RL have incorporated external knowledge policies to help agents improve sample efficiency.
We present Knowledge-Grounded RL (KGRL), an RL paradigm fusing multiple knowledge policies and aiming for human-like efficiency and flexibility.
arXiv Detail & Related papers (2022-10-07T17:56:57Z) - KnowRU: Knowledge Reusing via Knowledge Distillation in Multi-agent
Reinforcement Learning [16.167201058368303]
Deep Reinforcement Learning (RL) algorithms have achieved dramatically progress in the multi-agent area.
To alleviate this problem, efficient leveraging of the historical experience is essential.
We propose a method, named "KnowRU" for knowledge reusing.
arXiv Detail & Related papers (2021-03-27T12:38:01Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.