Related papers: SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning

SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning

URL: http://arxiv.org/abs/2007.04938v4
Date: Fri, 11 Jun 2021 21:00:13 GMT
Title: SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning
Authors: Kimin Lee, Michael Laskin, Aravind Srinivas, Pieter Abbeel
Abstract summary: We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
Score: 102.78958681141577
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Off-policy deep reinforcement learning (RL) has been successful in a range of challenging domains. However, standard off-policy RL algorithms can suffer from several issues, such as instability in Q-learning and balancing exploration and exploitation. To mitigate these issues, we present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy RL algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration. By enforcing the diversity between agents using Bootstrap with random initialization, we show that these different ideas are largely orthogonal and can be fruitfully integrated, together further improving the performance of existing off-policy RL algorithms, such as Soft Actor-Critic and Rainbow DQN, for both continuous and discrete control tasks on both low-dimensional and high-dimensional environments. Our training code is available at https://github.com/pokaxpoka/sunrise.

Related papers

Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning [62.984693936073974]
Value-based reinforcement learning can learn effective policies for a wide range of multi-turn problems. Current value-based RL methods have proven particularly challenging to scale to the setting of large language models. We propose a novel offline RL algorithm that addresses these drawbacks, casting Q-learning as a modified supervised fine-tuning problem.
arXiv Detail & Related papers (2024-11-07T21:36:52Z)
Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy. The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms. We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z)
Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach [2.3020018305241337]
This paper is the first to propose considering the RRL problems within the positional differential game theory. Namely, we prove that under Isaacs's condition, the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations. We present the Isaacs Deep Q-Network algorithms and demonstrate their superiority compared to other baseline RRL and Multi-Agent RL algorithms in various environments.
arXiv Detail & Related papers (2024-05-03T12:21:43Z)
Learning RL-Policies for Joint Beamforming Without Exploration: A Batch Constrained Off-Policy Approach [1.0080317855851213]
We consider the problem of network parameter cancellation optimization for networks. We show that deploying an algorithm in the real world for exploration and learning can be achieved with the data without exploring.
arXiv Detail & Related papers (2023-10-12T18:36:36Z)
Ensemble Reinforcement Learning in Continuous Spaces -- A Hierarchical Multi-Step Approach for Policy Training [4.982806898121435]
We propose a new technique to train an ensemble of base learners based on an innovative multi-step integration method. This training technique enables us to develop a new hierarchical learning algorithm for ensemble DRL that effectively promotes inter-learner collaboration. The algorithm is also shown empirically to outperform several state-of-the-art DRL algorithms on multiple benchmark RL problems.
arXiv Detail & Related papers (2022-09-29T00:42:44Z)
Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy. In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks. We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z)
Constraint Sampling Reinforcement Learning: Incorporating Expertise For Faster Learning [43.562783189118]
We introduce a practical algorithm for incorporating human insight to speed learning. Our algorithm, Constraint Sampling Reinforcement Learning (CSRL), incorporates prior domain knowledge as constraints/restrictions on the RL policy. In all cases, CSRL learns a good policy faster than baselines.
arXiv Detail & Related papers (2021-12-30T22:02:42Z)
URLB: Unsupervised Reinforcement Learning Benchmark [82.36060735454647]
We introduce the Unsupervised Reinforcement Learning Benchmark (URLB) URLB consists of two phases: reward-free pre-training and downstream task adaptation with extrinsic rewards. We provide twelve continuous control tasks from three domains for evaluation and open-source code for eight leading unsupervised RL methods.
arXiv Detail & Related papers (2021-10-28T15:07:01Z)
Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward. We introduce a new RL formulation for text generation from the soft Q-learning perspective. We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.