Constraint Sampling Reinforcement Learning: Incorporating Expertise For
Faster Learning
- URL: http://arxiv.org/abs/2112.15221v1
- Date: Thu, 30 Dec 2021 22:02:42 GMT
- Title: Constraint Sampling Reinforcement Learning: Incorporating Expertise For
Faster Learning
- Authors: Tong Mu, Georgios Theocharous, David Arbour, Emma Brunskill
- Abstract summary: We introduce a practical algorithm for incorporating human insight to speed learning.
Our algorithm, Constraint Sampling Reinforcement Learning (CSRL), incorporates prior domain knowledge as constraints/restrictions on the RL policy.
In all cases, CSRL learns a good policy faster than baselines.
- Score: 43.562783189118
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online reinforcement learning (RL) algorithms are often difficult to deploy
in complex human-facing applications as they may learn slowly and have poor
early performance. To address this, we introduce a practical algorithm for
incorporating human insight to speed learning. Our algorithm, Constraint
Sampling Reinforcement Learning (CSRL), incorporates prior domain knowledge as
constraints/restrictions on the RL policy. It takes in multiple potential
policy constraints to maintain robustness to misspecification of individual
constraints while leveraging helpful ones to learn quickly. Given a base RL
learning algorithm (ex. UCRL, DQN, Rainbow) we propose an upper confidence with
elimination scheme that leverages the relationship between the constraints, and
their observed performance, to adaptively switch among them. We instantiate our
algorithm with DQN-type algorithms and UCRL as base algorithms, and evaluate
our algorithm in four environments, including three simulators based on real
data: recommendations, educational activity sequencing, and HIV treatment
sequencing. In all cases, CSRL learns a good policy faster than baselines.
Related papers
- Reinforcement Learning with Stepwise Fairness Constraints [50.538878453547966]
We introduce the study of reinforcement learning with stepwise fairness constraints.
We provide learning algorithms with strong theoretical guarantees in regard to policy optimality and fairness violation.
arXiv Detail & Related papers (2022-11-08T04:06:23Z) - Ensemble Reinforcement Learning in Continuous Spaces -- A Hierarchical
Multi-Step Approach for Policy Training [4.982806898121435]
We propose a new technique to train an ensemble of base learners based on an innovative multi-step integration method.
This training technique enables us to develop a new hierarchical learning algorithm for ensemble DRL that effectively promotes inter-learner collaboration.
The algorithm is also shown empirically to outperform several state-of-the-art DRL algorithms on multiple benchmark RL problems.
arXiv Detail & Related papers (2022-09-29T00:42:44Z) - When does return-conditioned supervised learning work for offline
reinforcement learning? [51.899892382786526]
We study the capabilities and limitations of return-conditioned supervised learning.
We find that RCSL returns the optimal policy under a set of assumptions stronger than those needed for the more traditional dynamic programming-based algorithms.
arXiv Detail & Related papers (2022-06-02T15:05:42Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Deep Reinforcement Learning with Adjustments [10.244120641608447]
We propose a new Q-learning algorithm for continuous action space, which can bridge the control and RL algorithms.
Our method can learn complex policies to achieve long-term goals and at the same time it can be easily adjusted to address short-term requirements.
arXiv Detail & Related papers (2021-09-28T03:35:09Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - Deep RL With Information Constrained Policies: Generalization in
Continuous Control [21.46148507577606]
We show that a natural constraint on information flow might confer onto artificial agents in continuous control tasks.
We implement a novel Capacity-Limited Actor-Critic (CLAC) algorithm.
Our experiments show that compared to alternative approaches, CLAC offers improvements in generalization between training and modified test environments.
arXiv Detail & Related papers (2020-10-09T15:42:21Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.