Instance Weighted Incremental Evolution Strategies for Reinforcement
Learning in Dynamic Environments
- URL: http://arxiv.org/abs/2010.04605v2
- Date: Thu, 31 Mar 2022 08:28:02 GMT
- Title: Instance Weighted Incremental Evolution Strategies for Reinforcement
Learning in Dynamic Environments
- Authors: Zhi Wang and Chunlin Chen and Daoyi Dong
- Abstract summary: We propose a systematic incremental learning method for Evolution strategies (ES) in dynamic environments.
The goal is to adjust previously learned policy to a new one incrementally whenever the environment changes.
This paper introduces a family of scalable ES algorithms for RL domains that enables rapid learning adaptation to dynamic environments.
- Score: 11.076005074172516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Evolution strategies (ES), as a family of black-box optimization algorithms,
recently emerge as a scalable alternative to reinforcement learning (RL)
approaches such as Q-learning or policy gradient, and are much faster when many
central processing units (CPUs) are available due to better parallelization. In
this paper, we propose a systematic incremental learning method for ES in
dynamic environments. The goal is to adjust previously learned policy to a new
one incrementally whenever the environment changes. We incorporate an instance
weighting mechanism with ES to facilitate its learning adaptation, while
retaining scalability of ES. During parameter updating, higher weights are
assigned to instances that contain more new knowledge, thus encouraging the
search distribution to move towards new promising areas of parameter space. We
propose two easy-to-implement metrics to calculate the weights: instance
novelty and instance quality. Instance novelty measures an instance's
difference from the previous optimum in the original environment, while
instance quality corresponds to how well an instance performs in the new
environment. The resulting algorithm, Instance Weighted Incremental Evolution
Strategies (IW-IES), is verified to achieve significantly improved performance
on challenging RL tasks ranging from robot navigation to locomotion. This paper
thus introduces a family of scalable ES algorithms for RL domains that enables
rapid learning adaptation to dynamic environments.
Related papers
- Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning [113.89327264634984]
Few-shot class-incremental learning (FSCIL) confronts the challenge of integrating new classes into a model with minimal training samples.
Traditional methods widely adopt static adaptation relying on a fixed parameter space to learn from data that arrive sequentially.
We propose a dual selective SSM projector that dynamically adjusts the projection parameters based on the intermediate features for dynamic adaptation.
arXiv Detail & Related papers (2024-07-08T17:09:39Z) - Data-Efficient Task Generalization via Probabilistic Model-based Meta
Reinforcement Learning [58.575939354953526]
PACOH-RL is a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics.
Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics.
Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions.
arXiv Detail & Related papers (2023-11-13T18:51:57Z) - An advantage based policy transfer algorithm for reinforcement learning
with metrics of transferability [6.660458629649826]
Reinforcement learning (RL) can enable sequential decision-making in complex and high-dimensional environments.
transfer RL algorithms can be used for the transfer of knowledge from one or multiple source environments to a target environment.
This paper proposes an off-policy Advantage-based Policy Transfer algorithm, APT-RL, for fixed domain environments.
arXiv Detail & Related papers (2023-11-12T04:25:53Z) - Few-shot Quality-Diversity Optimization [50.337225556491774]
Quality-Diversity (QD) optimization has been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning.
We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation.
Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments.
arXiv Detail & Related papers (2021-09-14T17:12:20Z) - Learning to Continuously Optimize Wireless Resource in a Dynamic
Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment.
We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes.
Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z) - Efficient Feature Transformations for Discriminative and Generative
Continual Learning [98.10425163678082]
We propose a simple task-specific feature map transformation strategy for continual learning.
Theses provide powerful flexibility for learning new tasks, achieved with minimal parameters added to the base architecture.
We demonstrate the efficacy and efficiency of our method with an extensive set of experiments in discriminative (CIFAR-100 and ImageNet-1K) and generative sequences of tasks.
arXiv Detail & Related papers (2021-03-25T01:48:14Z) - Learning to Continuously Optimize Wireless Resource In Episodically
Dynamic Environment [55.91291559442884]
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment.
We propose to build the notion of continual learning into the modeling process of learning wireless systems.
Our design is based on a novel min-max formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2020-11-16T08:24:34Z) - Lifelong Incremental Reinforcement Learning with Online Bayesian
Inference [11.076005074172516]
A long-lived reinforcement learning agent is to incrementally adapt its behavior as its environment changes.
We propose LifeLong Reinforcement Learning (LLIRL), a new incremental algorithm for efficient lifelong adaptation to dynamic environments.
arXiv Detail & Related papers (2020-07-28T13:23:41Z) - Gradient Monitored Reinforcement Learning [0.0]
We focus on the enhancement of training and evaluation performance in reinforcement learning algorithms.
We propose an approach to steer the learning in the weight parameters of a neural network based on the dynamic development and feedback from the training process itself.
arXiv Detail & Related papers (2020-05-25T13:45:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.