ERL-Re$^2$: Efficient Evolutionary Reinforcement Learning with Shared
State Representation and Individual Policy Representation
- URL: http://arxiv.org/abs/2210.17375v2
- Date: Fri, 30 Jun 2023 11:57:54 GMT
- Title: ERL-Re$^2$: Efficient Evolutionary Reinforcement Learning with Shared
State Representation and Individual Policy Representation
- Authors: Jianye Hao, Pengyi Li, Hongyao Tang, Yan Zheng, Xian Fu, Zhaopeng Meng
- Abstract summary: We propose Evolutionary Reinforcement Learning with Two-scale State Representation and Policy Representation (ERL-Re$2$)
All EA and RL policies share the same nonlinear state representation while maintaining individual linear policy representations.
Experiments on a range of continuous control tasks show that ERL-Re$2$ consistently outperforms advanced baselines and achieves the State Of The Art (SOTA)
- Score: 31.9768280877473
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Reinforcement Learning (Deep RL) and Evolutionary Algorithms (EA) are
two major paradigms of policy optimization with distinct learning principles,
i.e., gradient-based v.s. gradient-free. An appealing research direction is
integrating Deep RL and EA to devise new methods by fusing their complementary
advantages. However, existing works on combining Deep RL and EA have two common
drawbacks: 1) the RL agent and EA agents learn their policies individually,
neglecting efficient sharing of useful common knowledge; 2) parameter-level
policy optimization guarantees no semantic level of behavior evolution for the
EA side. In this paper, we propose Evolutionary Reinforcement Learning with
Two-scale State Representation and Policy Representation (ERL-Re$^2$), a novel
solution to the aforementioned two drawbacks. The key idea of ERL-Re$^2$ is
two-scale representation: all EA and RL policies share the same nonlinear state
representation while maintaining individual} linear policy representations. The
state representation conveys expressive common features of the environment
learned by all the agents collectively; the linear policy representation
provides a favorable space for efficient policy optimization, where novel
behavior-level crossover and mutation operations can be performed. Moreover,
the linear policy representation allows convenient generalization of policy
fitness with the help of the Policy-extended Value Function Approximator
(PeVFA), further improving the sample efficiency of fitness estimation. The
experiments on a range of continuous control tasks show that ERL-Re$^2$
consistently outperforms advanced baselines and achieves the State Of The Art
(SOTA). Our code is available on https://github.com/yeshenpy/ERL-Re2.
Related papers
- Federated Offline Policy Optimization with Dual Regularization [12.320355780707168]
Federated Reinforcement Learning (FRL) has been deemed as a promising solution for intelligent decision-making in the era of Artificial Internet of Things.
Existing FRL approaches often entail repeated interactions with the environment during local updating, which can be prohibitively expensive or even infeasible in many real-world domains.
This paper proposes a novel offline federated policy optimization algorithm, named $textttO$, which enables distributed agents to collaboratively learn a decision policy only from private and static data.
arXiv Detail & Related papers (2024-05-24T04:24:03Z) - REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.
In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.
We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - A2PO: Towards Effective Offline Reinforcement Learning from an Advantage-aware Perspective [29.977702744504466]
We introduce a novel Advantage-Aware Policy Optimization (A2PO) method to explicitly construct advantage-aware policy constraints for offline learning.
A2PO employs a conditional variational auto-encoder to disentangle the action distributions of intertwined behavior policies.
Experiments conducted on both the single-quality and mixed-quality datasets of the D4RL benchmark demonstrate that A2PO yields results superior to the counterparts.
arXiv Detail & Related papers (2024-03-12T02:43:41Z) - Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online
Reinforcement Learning [71.02384943570372]
Family Offline-to-Online RL (FamO2O) is a framework that empowers existing algorithms to determine state-adaptive improvement-constraint balances.
FamO2O offers a statistically significant improvement over various existing methods, achieving state-of-the-art performance on the D4RL benchmark.
arXiv Detail & Related papers (2023-10-27T08:30:54Z) - Counterfactual Explanation Policies in RL [3.674863913115432]
COUNTERPOL is the first framework to analyze Reinforcement Learning policies using counterfactual explanations.
We establish a theoretical connection between Counterpol and widely used trust region-based policy optimization methods in RL.
arXiv Detail & Related papers (2023-07-25T01:14:56Z) - Offline Policy Optimization in RL with Variance Regularizaton [142.87345258222942]
We propose variance regularization for offline RL algorithms, using stationary distribution corrections.
We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer.
The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms.
arXiv Detail & Related papers (2022-12-29T18:25:01Z) - Offline Reinforcement Learning with Closed-Form Policy Improvement
Operators [88.54210578912554]
Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning.
In this paper, we propose our closed-form policy improvement operators.
We empirically demonstrate their effectiveness over state-of-the-art algorithms on the standard D4RL benchmark.
arXiv Detail & Related papers (2022-11-29T06:29:26Z) - Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets.
We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged.
We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z) - Policy Gradient for Reinforcement Learning with General Utilities [50.65940899590487]
In Reinforcement Learning (RL), the goal of agents is to discover an optimal policy that maximizes the expected cumulative rewards.
Many supervised and unsupervised RL problems are not covered in the Linear RL framework.
We derive the policy gradient theorem for RL with general utilities.
arXiv Detail & Related papers (2022-10-03T14:57:46Z) - Evolutionary Action Selection for Gradient-based Policy Learning [6.282299638495976]
Evolutionary algorithms (EAs) and Deep Reinforcement Learning (DRL) have recently been combined to integrate the advantages of the two solutions for better policy learning.
We propose Evolutionary Action Selection-Twin Delayed Deep Deterministic Policy Gradient (EAS-TD3), a novel combination of EA and DRL.
arXiv Detail & Related papers (2022-01-12T03:31:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.