Dynamics-Adaptive Continual Reinforcement Learning via Progressive
Contextualization
- URL: http://arxiv.org/abs/2209.00347v2
- Date: Tue, 23 May 2023 19:03:10 GMT
- Title: Dynamics-Adaptive Continual Reinforcement Learning via Progressive
Contextualization
- Authors: Tiantian Zhang, Zichuan Lin, Yuxing Wang, Deheng Ye, Qiang Fu, Wei
Yang, Xueqian Wang, Bin Liang, Bo Yuan, and Xiu Li
- Abstract summary: Key challenge of continual reinforcement learning (CRL) in dynamic environments is to promptly adapt the RL agent's behavior as the environment changes over its lifetime.
DaCoRL learns a context-conditioned policy using progressive contextualization.
DaCoRL features consistent superiority over existing methods in terms of the stability, overall performance and generalization ability.
- Score: 29.61829620717385
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: A key challenge of continual reinforcement learning (CRL) in dynamic
environments is to promptly adapt the RL agent's behavior as the environment
changes over its lifetime, while minimizing the catastrophic forgetting of the
learned information. To address this challenge, in this article, we propose
DaCoRL, i.e., dynamics-adaptive continual RL. DaCoRL learns a
context-conditioned policy using progressive contextualization, which
incrementally clusters a stream of stationary tasks in the dynamic environment
into a series of contexts and opts for an expandable multihead neural network
to approximate the policy. Specifically, we define a set of tasks with similar
dynamics as an environmental context and formalize context inference as a
procedure of online Bayesian infinite Gaussian mixture clustering on
environment features, resorting to online Bayesian inference to infer the
posterior distribution over contexts. Under the assumption of a Chinese
restaurant process prior, this technique can accurately classify the current
task as a previously seen context or instantiate a new context as needed
without relying on any external indicator to signal environmental changes in
advance. Furthermore, we employ an expandable multihead neural network whose
output layer is synchronously expanded with the newly instantiated context, and
a knowledge distillation regularization term for retaining the performance on
learned tasks. As a general framework that can be coupled with various deep RL
algorithms, DaCoRL features consistent superiority over existing methods in
terms of the stability, overall performance and generalization ability, as
verified by extensive experiments on several robot navigation and MuJoCo
locomotion tasks.
Related papers
- Continual Task Learning through Adaptive Policy Self-Composition [54.95680427960524]
CompoFormer is a structure-based continual transformer model that adaptively composes previous policies via a meta-policy network.
Our experiments reveal that CompoFormer outperforms conventional continual learning (CL) methods, particularly in longer task sequences.
arXiv Detail & Related papers (2024-11-18T08:20:21Z) - Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement Learning [4.902544998453533]
We argue that understanding and utilizing contextual cues, such as the gravity level of the environment, is critical for robust generalization.
Our algorithm demonstrates improved generalization on various simulated domains, outperforming prior context-learning techniques in zero-shot settings.
arXiv Detail & Related papers (2024-04-15T07:31:48Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Agent based modelling for continuously varying supply chains [4.163948606359882]
This paper seeks to address whether agents can control varying supply chain problems.
Two state-of-the-art Reinforcement Learning (RL) algorithms are compared.
Results show that more lean strategies adopted in Batch environments are different from those adopted in environments with varying products.
arXiv Detail & Related papers (2023-12-24T15:04:46Z) - Online Reinforcement Learning in Non-Stationary Context-Driven
Environments [13.898711495948254]
We study online reinforcement learning (RL) in non-stationary environments.
Online RL is challenging in such environments due to "catastrophic forgetting" (CF)
We present Locally Constrained Policy Optimization (LCPO), an online RL approach that combats CF by anchoring policy outputs on old experiences.
arXiv Detail & Related papers (2023-02-04T15:31:19Z) - Learning to Continuously Optimize Wireless Resource in a Dynamic
Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment.
We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes.
Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z) - Learning to Continuously Optimize Wireless Resource In Episodically
Dynamic Environment [55.91291559442884]
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment.
We propose to build the notion of continual learning into the modeling process of learning wireless systems.
Our design is based on a novel min-max formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2020-11-16T08:24:34Z) - Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity.
Our method leverages latent variable models to learn a representation of the environment from current and past experiences.
We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z) - Robust Reinforcement Learning via Adversarial training with Langevin
Dynamics [51.234482917047835]
We introduce a sampling perspective to tackle the challenging task of training robust Reinforcement Learning (RL) agents.
We present a novel, scalable two-player RL algorithm, which is a sampling variant of the two-player policy method.
arXiv Detail & Related papers (2020-02-14T14:59:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.