DERAIL: Diagnostic Environments for Reward And Imitation Learning
- URL: http://arxiv.org/abs/2012.01365v1
- Date: Wed, 2 Dec 2020 18:07:09 GMT
- Title: DERAIL: Diagnostic Environments for Reward And Imitation Learning
- Authors: Pedro Freire, Adam Gleave, Sam Toyer, Stuart Russell
- Abstract summary: We develop a suite of diagnostic tasks that test individual facets of algorithm performance in isolation.
Results confirm that algorithm performance is highly sensitive to implementation details.
Case-study shows how the suite can pinpoint design flaws and rapidly evaluate candidate solutions.
- Score: 9.099589602551573
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The objective of many real-world tasks is complex and difficult to
procedurally specify. This makes it necessary to use reward or imitation
learning algorithms to infer a reward or policy directly from human data.
Existing benchmarks for these algorithms focus on realism, testing in complex
environments. Unfortunately, these benchmarks are slow, unreliable and cannot
isolate failures. As a complementary approach, we develop a suite of simple
diagnostic tasks that test individual facets of algorithm performance in
isolation. We evaluate a range of common reward and imitation learning
algorithms on our tasks. Our results confirm that algorithm performance is
highly sensitive to implementation details. Moreover, in a case-study into a
popular preference-based reward learning implementation, we illustrate how the
suite can pinpoint design flaws and rapidly evaluate candidate solutions. The
environments are available at https://github.com/HumanCompatibleAI/seals .
Related papers
- A Human-Centered Approach for Improving Supervised Learning [0.44378250612683995]
This paper shows how we can strike a balance between performance, time, and resource constraints.
Another goal of this research is to make Ensembles more explainable and intelligible using the Human-Centered approach.
arXiv Detail & Related papers (2024-10-14T10:27:14Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Planning for Sample Efficient Imitation Learning [52.44953015011569]
Current imitation algorithms struggle to achieve high performance and high in-environment sample efficiency simultaneously.
We propose EfficientImitate, a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously.
Experimental results show that EI achieves state-of-the-art results in performance and sample efficiency.
arXiv Detail & Related papers (2022-10-18T05:19:26Z) - Automated Decision-based Adversarial Attacks [48.01183253407982]
We consider the practical and challenging decision-based black-box adversarial setting.
Under this setting, the attacker can only acquire the final classification labels by querying the target model.
We propose to automatically discover decision-based adversarial attack algorithms.
arXiv Detail & Related papers (2021-05-09T13:15:10Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - Low-Regret Active learning [64.36270166907788]
We develop an online learning algorithm for identifying unlabeled data points that are most informative for training.
At the core of our work is an efficient algorithm for sleeping experts that is tailored to achieve low regret on predictable (easy) instances.
arXiv Detail & Related papers (2021-04-06T22:53:45Z) - Benchmarking Simulation-Based Inference [5.3898004059026325]
Recent advances in probabilistic modelling have led to a large number of simulation-based inference algorithms which do not require numerical evaluation of likelihoods.
We provide a benchmark with inference tasks and suitable performance metrics, with an initial selection of algorithms.
We found that the choice of performance metric is critical, that even state-of-the-art algorithms have substantial room for improvement, and that sequential estimation improves sample efficiency.
arXiv Detail & Related papers (2021-01-12T18:31:22Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Learning to Actively Learn: A Robust Approach [22.75298609290053]
This work proposes a procedure for designing algorithms for adaptive data collection tasks like active learning and pure-exploration multi-armed bandits.
Our adaptive algorithm is learned via adversarial training over equivalence classes of problems derived from information theoretic lower bounds.
We perform synthetic experiments to justify the stability and effectiveness of the training procedure, and then evaluate the method on tasks derived from real data.
arXiv Detail & Related papers (2020-10-29T06:48:22Z) - Fast and stable MAP-Elites in noisy domains using deep grids [1.827510863075184]
Deep-Grid MAP-Elites is a variant of the MAP-Elites algorithm that uses an archive of similar previously encountered solutions to approximate the performance of a solution.
We show that this simple approach is significantly more resilient to noise on the behavioural descriptors, while achieving competitive performances in terms of fitness optimisation.
arXiv Detail & Related papers (2020-06-25T08:47:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.