Related papers: Reinforcement Learning Under Algorithmic Triage

Reinforcement Learning Under Algorithmic Triage

URL: http://arxiv.org/abs/2109.11328v1
Date: Thu, 23 Sep 2021 12:21:26 GMT
Title: Reinforcement Learning Under Algorithmic Triage
Authors: Eleni Straitouri, Adish Singla, Vahid Balazadeh Meresht, Manuel Gomez-Rodriguez
Abstract summary: We develop a two-stage actor-critic method to learn reinforcement learning models under triage. The first stage performs offline, off-policy training using human data gathered in an environment where the human has operated on their own. The second stage performs on-policy training to account for the impact that switching may have on the human policy.
Score: 33.80293624975863
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Methods to learn under algorithmic triage have predominantly focused on supervised learning settings where each decision, or prediction, is independent of each other. Under algorithmic triage, a supervised learning model predicts a fraction of the instances and humans predict the remaining ones. In this work, we take a first step towards developing reinforcement learning models that are optimized to operate under algorithmic triage. To this end, we look at the problem through the framework of options and develop a two-stage actor-critic method to learn reinforcement learning models under triage. The first stage performs offline, off-policy training using human data gathered in an environment where the human has operated on their own. The second stage performs on-policy training to account for the impact that switching may have on the human policy, which may be difficult to anticipate from the above human data. Extensive simulation experiments in a synthetic car driving task show that the machine models and the triage policies trained using our two-stage method effectively complement human policies and outperform those provided by several competitive baselines.

Related papers

Contextual Online Uncertainty-Aware Preference Learning for Human Feedback [13.478503755314344]
Reinforcement Learning from Human Feedback (RLHF) has become a pivotal paradigm in artificial intelligence. We propose a novel statistical framework to simultaneously conduct the online decision-making and statistical inference on the optimal model. We apply the proposed framework to analyze the human preference data for ranking large language models on the Massive Multitask Language Understanding dataset.
arXiv Detail & Related papers (2025-04-27T19:59:11Z)
Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning. Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy. Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z)
PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [71.63186089279218]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT. On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt. On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z)
Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation [118.27432851053335]
This paper presents an overview and comparative analysis of our systems designed for the following two tracks in SAPIEN ManiSkill Challenge 2021: No Interaction Track. The No Interaction track targets for learning policies from pre-collected demonstration trajectories. In this track, we design a Heuristic Rule-based Method (HRM) to trigger high-quality object manipulation by decomposing the task into a series of sub-tasks. For each sub-task, the simple rule-based controlling strategies are adopted to predict actions that can be applied to robotic arms.
arXiv Detail & Related papers (2022-06-13T16:20:42Z)
Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models [67.78935378952146]
GenRL is a framework for solving sequential decision-making problems. It exploits the combination of reinforcement learning and latent variable generative models. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training.
arXiv Detail & Related papers (2022-04-18T22:02:32Z)
What Matters in Learning from Offline Human Demonstrations for Robot Manipulation [64.43440450794495]
We conduct an extensive study of six offline learning algorithms for robot manipulation. Our study analyzes the most critical challenges when learning from offline human data. We highlight opportunities for learning from human datasets.
arXiv Detail & Related papers (2021-08-06T20:48:30Z)
A Survey of Human-in-the-loop for Machine Learning [7.056132067948671]
Human-in-the-loop aims to train an accurate prediction model with minimum cost by integrating human knowledge and experience. This survey intends to provide a high-level summarization for human-in-the-loop and motivates interested readers to consider approaches for designing effective human-in-the-loop solutions.
arXiv Detail & Related papers (2021-08-02T14:42:28Z)
Differentiable Learning Under Triage [25.41072393963499]
Under algorithmic triage, a predictive model does not predict all instances but defers some of them to human experts. We show that models trained for full automation may be suboptimal under triage. We introduce a practical gradient-based algorithm that is guaranteed to find a sequence of triage policies and predictive models of increasing performance.
arXiv Detail & Related papers (2021-03-16T08:07:31Z)
Human-in-the-Loop Methods for Data-Driven and Reinforcement Learning Systems [0.8223798883838329]
This research investigates how to integrate human interaction modalities to the reinforcement learning loop. Results show that the reward signal that is learned based upon human interaction accelerates the rate of learning of reinforcement learning algorithms.
arXiv Detail & Related papers (2020-08-30T17:28:18Z)
Data-efficient visuomotor policy training using reinforcement learning and generative models [27.994338318811952]
We present a data-efficient framework for solving visuomotor sequential decision-making problems. We exploit the combination of reinforcement learning and latent variable generative models.
arXiv Detail & Related papers (2020-07-26T14:19:00Z)
On the interaction between supervision and self-play in emergent communication [82.290338507106]
We investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency. We find that first training agents via supervised learning on human data followed by self-play outperforms the converse.
arXiv Detail & Related papers (2020-02-04T02:35:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.