Keep Various Trajectories: Promoting Exploration of Ensemble Policies in
Continuous Control
- URL: http://arxiv.org/abs/2310.11138v1
- Date: Tue, 17 Oct 2023 10:40:05 GMT
- Title: Keep Various Trajectories: Promoting Exploration of Ensemble Policies in
Continuous Control
- Authors: Chao Li, Chen Gong, Qiang He, Xinwen Hou
- Abstract summary: This study introduces a new ensemble RL algorithm, termed TEEN.
TEEN enhances the sample diversity of the ensemble policy compared to using sub-policies alone.
On average, TEEN outperforms the baseline ensemble DRL algorithms by 41% in performance on the tested representative environments.
- Score: 17.64972760231609
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The combination of deep reinforcement learning (DRL) with ensemble methods
has been proved to be highly effective in addressing complex sequential
decision-making problems. This success can be primarily attributed to the
utilization of multiple models, which enhances both the robustness of the
policy and the accuracy of value function estimation. However, there has been
limited analysis of the empirical success of current ensemble RL methods thus
far. Our new analysis reveals that the sample efficiency of previous ensemble
DRL algorithms may be limited by sub-policies that are not as diverse as they
could be. Motivated by these findings, our study introduces a new ensemble RL
algorithm, termed \textbf{T}rajectories-awar\textbf{E} \textbf{E}nsemble
exploratio\textbf{N} (TEEN). The primary goal of TEEN is to maximize the
expected return while promoting more diverse trajectories. Through extensive
experiments, we demonstrate that TEEN not only enhances the sample diversity of
the ensemble policy compared to using sub-policies alone but also improves the
performance over ensemble RL algorithms. On average, TEEN outperforms the
baseline ensemble DRL algorithms by 41\% in performance on the tested
representative environments.
Related papers
- Finite-Time Convergence and Sample Complexity of Actor-Critic Multi-Objective Reinforcement Learning [20.491176017183044]
This paper tackles the multi-objective reinforcement learning (MORL) problem.
It introduces an innovative actor-critic algorithm named MOAC which finds a policy by iteratively making trade-offs among conflicting reward signals.
arXiv Detail & Related papers (2024-05-05T23:52:57Z) - Reinforcement Learning-assisted Evolutionary Algorithm: A Survey and
Research Opportunities [63.258517066104446]
Reinforcement learning integrated as a component in the evolutionary algorithm has demonstrated superior performance in recent years.
We discuss the RL-EA integration method, the RL-assisted strategy adopted by RL-EA, and its applications according to the existing literature.
In the applications of RL-EA section, we also demonstrate the excellent performance of RL-EA on several benchmarks and a range of public datasets.
arXiv Detail & Related papers (2023-08-25T15:06:05Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - Learning Better with Less: Effective Augmentation for Sample-Efficient
Visual Reinforcement Learning [57.83232242068982]
Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms.
It remains unclear which attributes of DA account for its effectiveness in achieving sample-efficient visual RL.
This work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy.
arXiv Detail & Related papers (2023-05-25T15:46:20Z) - One-Step Distributional Reinforcement Learning [10.64435582017292]
We present the simpler one-step distributional reinforcement learning (OS-DistrRL) framework.
We show that our approach comes with a unified theory for both policy evaluation and control.
We propose two OS-DistrRL algorithms for which we provide an almost sure convergence analysis.
arXiv Detail & Related papers (2023-04-27T06:57:00Z) - Ensemble Reinforcement Learning in Continuous Spaces -- A Hierarchical
Multi-Step Approach for Policy Training [4.982806898121435]
We propose a new technique to train an ensemble of base learners based on an innovative multi-step integration method.
This training technique enables us to develop a new hierarchical learning algorithm for ensemble DRL that effectively promotes inter-learner collaboration.
The algorithm is also shown empirically to outperform several state-of-the-art DRL algorithms on multiple benchmark RL problems.
arXiv Detail & Related papers (2022-09-29T00:42:44Z) - The Nature of Temporal Difference Errors in Multi-step Distributional
Reinforcement Learning [46.85801978792022]
We study the multi-step off-policy learning approach to distributional RL.
We identify a novel notion of path-dependent distributional TD error.
We derive a novel algorithm, Quantile Regression-Retrace, which leads to a deep RL agent QR-DQN-Retrace.
arXiv Detail & Related papers (2022-07-15T16:19:23Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z) - Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement
Learning [7.020079427649125]
We show that grasping distinguishable skills for some tasks with non-unique optima can be essential for further improving its learning efficiency and performance.
We propose a probabilistic mixture-of-experts (PMOE) for multimodal policy, together with a novel gradient estimator for the indifferentiability problem.
arXiv Detail & Related papers (2021-04-19T08:21:56Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.