Reinforcement Learning Under Algorithmic Triage
- URL: http://arxiv.org/abs/2109.11328v1
- Date: Thu, 23 Sep 2021 12:21:26 GMT
- Title: Reinforcement Learning Under Algorithmic Triage
- Authors: Eleni Straitouri, Adish Singla, Vahid Balazadeh Meresht, Manuel
Gomez-Rodriguez
- Abstract summary: We develop a two-stage actor-critic method to learn reinforcement learning models under triage.
The first stage performs offline, off-policy training using human data gathered in an environment where the human has operated on their own.
The second stage performs on-policy training to account for the impact that switching may have on the human policy.
- Score: 33.80293624975863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Methods to learn under algorithmic triage have predominantly focused on
supervised learning settings where each decision, or prediction, is independent
of each other. Under algorithmic triage, a supervised learning model predicts a
fraction of the instances and humans predict the remaining ones. In this work,
we take a first step towards developing reinforcement learning models that are
optimized to operate under algorithmic triage. To this end, we look at the
problem through the framework of options and develop a two-stage actor-critic
method to learn reinforcement learning models under triage. The first stage
performs offline, off-policy training using human data gathered in an
environment where the human has operated on their own. The second stage
performs on-policy training to account for the impact that switching may have
on the human policy, which may be difficult to anticipate from the above human
data. Extensive simulation experiments in a synthetic car driving task show
that the machine models and the triage policies trained using our two-stage
method effectively complement human policies and outperform those provided by
several competitive baselines.
Related papers
- Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [71.63186089279218]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT.
On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt.
On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z) - Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and
Heuristic Rule-based Methods for Object Manipulation [118.27432851053335]
This paper presents an overview and comparative analysis of our systems designed for the following two tracks in SAPIEN ManiSkill Challenge 2021: No Interaction Track.
The No Interaction track targets for learning policies from pre-collected demonstration trajectories.
In this track, we design a Heuristic Rule-based Method (HRM) to trigger high-quality object manipulation by decomposing the task into a series of sub-tasks.
For each sub-task, the simple rule-based controlling strategies are adopted to predict actions that can be applied to robotic arms.
arXiv Detail & Related papers (2022-06-13T16:20:42Z) - Training and Evaluation of Deep Policies using Reinforcement Learning
and Generative Models [67.78935378952146]
GenRL is a framework for solving sequential decision-making problems.
It exploits the combination of reinforcement learning and latent variable generative models.
We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training.
arXiv Detail & Related papers (2022-04-18T22:02:32Z) - What Matters in Learning from Offline Human Demonstrations for Robot
Manipulation [64.43440450794495]
We conduct an extensive study of six offline learning algorithms for robot manipulation.
Our study analyzes the most critical challenges when learning from offline human data.
We highlight opportunities for learning from human datasets.
arXiv Detail & Related papers (2021-08-06T20:48:30Z) - A Survey of Human-in-the-loop for Machine Learning [7.056132067948671]
Human-in-the-loop aims to train an accurate prediction model with minimum cost by integrating human knowledge and experience.
This survey intends to provide a high-level summarization for human-in-the-loop and motivates interested readers to consider approaches for designing effective human-in-the-loop solutions.
arXiv Detail & Related papers (2021-08-02T14:42:28Z) - Differentiable Learning Under Triage [25.41072393963499]
Under algorithmic triage, a predictive model does not predict all instances but defers some of them to human experts.
We show that models trained for full automation may be suboptimal under triage.
We introduce a practical gradient-based algorithm that is guaranteed to find a sequence of triage policies and predictive models of increasing performance.
arXiv Detail & Related papers (2021-03-16T08:07:31Z) - Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning
Systems [0.8223798883838329]
This research investigates how to integrate human interaction modalities to the reinforcement learning loop.
Results show that the reward signal that is learned based upon human interaction accelerates the rate of learning of reinforcement learning algorithms.
arXiv Detail & Related papers (2020-08-30T17:28:18Z) - Data-efficient visuomotor policy training using reinforcement learning
and generative models [27.994338318811952]
We present a data-efficient framework for solving visuomotor sequential decision-making problems.
We exploit the combination of reinforcement learning and latent variable generative models.
arXiv Detail & Related papers (2020-07-26T14:19:00Z) - On the interaction between supervision and self-play in emergent
communication [82.290338507106]
We investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency.
We find that first training agents via supervised learning on human data followed by self-play outperforms the converse.
arXiv Detail & Related papers (2020-02-04T02:35:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.