SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning
- URL: http://arxiv.org/abs/2007.04938v4
- Date: Fri, 11 Jun 2021 21:00:13 GMT
- Title: SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning
- Authors: Kimin Lee, Michael Laskin, Aravind Srinivas, Pieter Abbeel
- Abstract summary: We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
- Score: 102.78958681141577
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Off-policy deep reinforcement learning (RL) has been successful in a range of
challenging domains. However, standard off-policy RL algorithms can suffer from
several issues, such as instability in Q-learning and balancing exploration and
exploitation. To mitigate these issues, we present SUNRISE, a simple unified
ensemble method, which is compatible with various off-policy RL algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman
backups, which re-weight target Q-values based on uncertainty estimates from a
Q-ensemble, and (b) an inference method that selects actions using the highest
upper-confidence bounds for efficient exploration. By enforcing the diversity
between agents using Bootstrap with random initialization, we show that these
different ideas are largely orthogonal and can be fruitfully integrated,
together further improving the performance of existing off-policy RL
algorithms, such as Soft Actor-Critic and Rainbow DQN, for both continuous and
discrete control tasks on both low-dimensional and high-dimensional
environments. Our training code is available at
https://github.com/pokaxpoka/sunrise.
Related papers
- Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy.
The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms.
We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z) - Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach [2.3020018305241337]
This paper is the first to propose considering the RRL problems within the positional differential game theory.
Namely, we prove that under Isaacs's condition, the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations.
We present the Isaacs Deep Q-Network algorithms and demonstrate their superiority compared to other baseline RRL and Multi-Agent RL algorithms in various environments.
arXiv Detail & Related papers (2024-05-03T12:21:43Z) - Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF [82.73541793388]
We introduce the first principled algorithmic framework for solving bilevel RL problems through the lens of penalty formulation.
We provide theoretical studies of the problem landscape and its penalty-based gradient (policy) algorithms.
We demonstrate the effectiveness of our algorithms via simulations in the Stackelberg Markov game, RL from human feedback and incentive design.
arXiv Detail & Related papers (2024-02-10T04:54:15Z) - Learning RL-Policies for Joint Beamforming Without Exploration: A Batch
Constrained Off-Policy Approach [1.0080317855851213]
We consider the problem of network parameter cancellation optimization for networks.
We show that deploying an algorithm in the real world for exploration and learning can be achieved with the data without exploring.
arXiv Detail & Related papers (2023-10-12T18:36:36Z) - Ensemble Reinforcement Learning in Continuous Spaces -- A Hierarchical
Multi-Step Approach for Policy Training [4.982806898121435]
We propose a new technique to train an ensemble of base learners based on an innovative multi-step integration method.
This training technique enables us to develop a new hierarchical learning algorithm for ensemble DRL that effectively promotes inter-learner collaboration.
The algorithm is also shown empirically to outperform several state-of-the-art DRL algorithms on multiple benchmark RL problems.
arXiv Detail & Related papers (2022-09-29T00:42:44Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Constraint Sampling Reinforcement Learning: Incorporating Expertise For
Faster Learning [43.562783189118]
We introduce a practical algorithm for incorporating human insight to speed learning.
Our algorithm, Constraint Sampling Reinforcement Learning (CSRL), incorporates prior domain knowledge as constraints/restrictions on the RL policy.
In all cases, CSRL learns a good policy faster than baselines.
arXiv Detail & Related papers (2021-12-30T22:02:42Z) - URLB: Unsupervised Reinforcement Learning Benchmark [82.36060735454647]
We introduce the Unsupervised Reinforcement Learning Benchmark (URLB)
URLB consists of two phases: reward-free pre-training and downstream task adaptation with extrinsic rewards.
We provide twelve continuous control tasks from three domains for evaluation and open-source code for eight leading unsupervised RL methods.
arXiv Detail & Related papers (2021-10-28T15:07:01Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.