Lifelong Incremental Reinforcement Learning with Online Bayesian
Inference
- URL: http://arxiv.org/abs/2007.14196v2
- Date: Fri, 12 Feb 2021 10:48:28 GMT
- Title: Lifelong Incremental Reinforcement Learning with Online Bayesian
Inference
- Authors: Zhi Wang, Chunlin Chen, Daoyi Dong
- Abstract summary: A long-lived reinforcement learning agent is to incrementally adapt its behavior as its environment changes.
We propose LifeLong Reinforcement Learning (LLIRL), a new incremental algorithm for efficient lifelong adaptation to dynamic environments.
- Score: 11.076005074172516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A central capability of a long-lived reinforcement learning (RL) agent is to
incrementally adapt its behavior as its environment changes, and to
incrementally build upon previous experiences to facilitate future learning in
real-world scenarios. In this paper, we propose LifeLong Incremental
Reinforcement Learning (LLIRL), a new incremental algorithm for efficient
lifelong adaptation to dynamic environments. We develop and maintain a library
that contains an infinite mixture of parameterized environment models, which is
equivalent to clustering environment parameters in a latent space. The prior
distribution over the mixture is formulated as a Chinese restaurant process
(CRP), which incrementally instantiates new environment models without any
external information to signal environmental changes in advance. During
lifelong learning, we employ the expectation maximization (EM) algorithm with
online Bayesian inference to update the mixture in a fully incremental manner.
In EM, the E-step involves estimating the posterior expectation of
environment-to-cluster assignments, while the M-step updates the environment
parameters for future learning. This method allows for all environment models
to be adapted as necessary, with new models instantiated for environmental
changes and old models retrieved when previously seen environments are
encountered again. Experiments demonstrate that LLIRL outperforms relevant
existing methods, and enables effective incremental adaptation to various
dynamic environments for lifelong learning.
Related papers
- A Comparative Study of Machine Learning Algorithms for Anomaly Detection
in Industrial Environments: Performance and Environmental Impact [62.997667081978825]
This study seeks to address the demands of high-performance machine learning models with environmental sustainability.
Traditional machine learning algorithms, such as Decision Trees and Random Forests, demonstrate robust efficiency and performance.
However, superior outcomes were obtained with optimised configurations, albeit with a commensurate increase in resource consumption.
arXiv Detail & Related papers (2023-07-01T15:18:00Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Meta-Reinforcement Learning Using Model Parameters [8.442084903594528]
This paper presents RAMP, a Reinforcement learning Agent using Model Parameters.
RAMP is constructed in two phases: in the first phase, a multi-environment parameterized dynamic model is learned.
In the second phase, the model parameters of the dynamic model are used as context for the multi-environment policy of the model-free reinforcement learning agent.
arXiv Detail & Related papers (2022-10-27T14:54:06Z) - Dynamics-Adaptive Continual Reinforcement Learning via Progressive
Contextualization [29.61829620717385]
Key challenge of continual reinforcement learning (CRL) in dynamic environments is to promptly adapt the RL agent's behavior as the environment changes over its lifetime.
DaCoRL learns a context-conditioned policy using progressive contextualization.
DaCoRL features consistent superiority over existing methods in terms of the stability, overall performance and generalization ability.
arXiv Detail & Related papers (2022-09-01T10:26:58Z) - Continual Predictive Learning from Videos [100.27176974654559]
We study a new continual learning problem in the context of video prediction.
We propose the continual predictive learning (CPL) approach, which learns a mixture world model via predictive experience replay.
We construct two new benchmarks based on RoboNet and KTH, in which different tasks correspond to different physical robotic environments or human actions.
arXiv Detail & Related papers (2022-04-12T08:32:26Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Learning to Continuously Optimize Wireless Resource in a Dynamic
Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment.
We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes.
Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z) - Learning to Continuously Optimize Wireless Resource In Episodically
Dynamic Environment [55.91291559442884]
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment.
We propose to build the notion of continual learning into the modeling process of learning wireless systems.
Our design is based on a novel min-max formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2020-11-16T08:24:34Z) - Instance Weighted Incremental Evolution Strategies for Reinforcement
Learning in Dynamic Environments [11.076005074172516]
We propose a systematic incremental learning method for Evolution strategies (ES) in dynamic environments.
The goal is to adjust previously learned policy to a new one incrementally whenever the environment changes.
This paper introduces a family of scalable ES algorithms for RL domains that enables rapid learning adaptation to dynamic environments.
arXiv Detail & Related papers (2020-10-09T14:31:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.