Related papers: Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization

Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization

URL: http://arxiv.org/abs/2209.00347v2
Date: Tue, 23 May 2023 19:03:10 GMT
Title: Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization
Authors: Tiantian Zhang, Zichuan Lin, Yuxing Wang, Deheng Ye, Qiang Fu, Wei Yang, Xueqian Wang, Bin Liang, Bo Yuan, and Xiu Li
Abstract summary: Key challenge of continual reinforcement learning (CRL) in dynamic environments is to promptly adapt the RL agent's behavior as the environment changes over its lifetime. DaCoRL learns a context-conditioned policy using progressive contextualization. DaCoRL features consistent superiority over existing methods in terms of the stability, overall performance and generalization ability.
Score: 29.61829620717385
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: A key challenge of continual reinforcement learning (CRL) in dynamic environments is to promptly adapt the RL agent's behavior as the environment changes over its lifetime, while minimizing the catastrophic forgetting of the learned information. To address this challenge, in this article, we propose DaCoRL, i.e., dynamics-adaptive continual RL. DaCoRL learns a context-conditioned policy using progressive contextualization, which incrementally clusters a stream of stationary tasks in the dynamic environment into a series of contexts and opts for an expandable multihead neural network to approximate the policy. Specifically, we define a set of tasks with similar dynamics as an environmental context and formalize context inference as a procedure of online Bayesian infinite Gaussian mixture clustering on environment features, resorting to online Bayesian inference to infer the posterior distribution over contexts. Under the assumption of a Chinese restaurant process prior, this technique can accurately classify the current task as a previously seen context or instantiate a new context as needed without relying on any external indicator to signal environmental changes in advance. Furthermore, we employ an expandable multihead neural network whose output layer is synchronously expanded with the newly instantiated context, and a knowledge distillation regularization term for retaining the performance on learned tasks. As a general framework that can be coupled with various deep RL algorithms, DaCoRL features consistent superiority over existing methods in terms of the stability, overall performance and generalization ability, as verified by extensive experiments on several robot navigation and MuJoCo locomotion tasks.

Related papers

Zero-Shot Context Generalization in Reinforcement Learning from Few Training Contexts [1.5020330976600738]
We introduce context-enhanced Bellman equation (CEBE) to improve generalization when training on a single context.<n>We derive context sample enhancement (CSE) as an efficient data augmentation method for approximating the CEBE in deterministic control environments.
arXiv Detail & Related papers (2025-07-10T00:23:13Z)
Scalable In-Context Q-Learning [42.80296905313835]
We propose textbfScalable textbfIn-textbfContext textbfQ-textbfLearning (textbfSICQL) to steer in-context reinforcement learning.<n>textbfSICQL harnesses dynamic programming and world modeling to steer ICRL toward efficient reward and task generalization.
arXiv Detail & Related papers (2025-06-02T04:21:56Z)
Continual Task Learning through Adaptive Policy Self-Composition [54.95680427960524]
CompoFormer is a structure-based continual transformer model that adaptively composes previous policies via a meta-policy network. Our experiments reveal that CompoFormer outperforms conventional continual learning (CL) methods, particularly in longer task sequences.
arXiv Detail & Related papers (2024-11-18T08:20:21Z)
Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement Learning [4.902544998453533]
We argue that understanding and utilizing contextual cues, such as the gravity level of the environment, is critical for robust generalization. Our algorithm demonstrates improved generalization on various simulated domains, outperforming prior context-learning techniques in zero-shot settings.
arXiv Detail & Related papers (2024-04-15T07:31:48Z)
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
Agent based modelling for continuously varying supply chains [4.163948606359882]
This paper seeks to address whether agents can control varying supply chain problems. Two state-of-the-art Reinforcement Learning (RL) algorithms are compared. Results show that more lean strategies adopted in Batch environments are different from those adopted in environments with varying products.
arXiv Detail & Related papers (2023-12-24T15:04:46Z)
Online Reinforcement Learning in Non-Stationary Context-Driven Environments [13.898711495948254]
We study online reinforcement learning (RL) in non-stationary environments. Online RL is challenging in such environments due to "catastrophic forgetting" (CF) We present Locally Constrained Policy Optimization (LCPO), an online RL approach that combats CF by anchoring policy outputs on old experiences.
arXiv Detail & Related papers (2023-02-04T15:31:19Z)
An Adaptive Deep RL Method for Non-Stationary Environments with Piecewise Stable Context [109.49663559151377]
Existing works on adaptation to unknown environment contexts either assume the contexts are the same for the whole episode or assume the context variables are Markovian. In this paper, we propose a textittextbfSegmented textbfContext textbfBelief textbfAugmented textbfDeep(SeCBAD) RL method. Our method can jointly infer the belief distribution over latent context with the posterior over segment length and perform more accurate belief context inference with observed data within the current
arXiv Detail & Related papers (2022-12-24T13:43:39Z)
Learning to Continuously Optimize Wireless Resource in a Dynamic Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment. We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes. Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z)
Learning to Continuously Optimize Wireless Resource In Episodically Dynamic Environment [55.91291559442884]
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment. We propose to build the notion of continual learning into the modeling process of learning wireless systems. Our design is based on a novel min-max formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2020-11-16T08:24:34Z)
Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences. We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)
Robust Reinforcement Learning via Adversarial training with Langevin Dynamics [51.234482917047835]
We introduce a sampling perspective to tackle the challenging task of training robust Reinforcement Learning (RL) agents. We present a novel, scalable two-player RL algorithm, which is a sampling variant of the two-player policy method.
arXiv Detail & Related papers (2020-02-14T14:59:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.