imitation: Clean Imitation Learning Implementations
- URL: http://arxiv.org/abs/2211.11972v1
- Date: Tue, 22 Nov 2022 03:11:29 GMT
- Title: imitation: Clean Imitation Learning Implementations
- Authors: Adam Gleave, Mohammad Taufeeque, Juan Rocamonde, Erik Jenner, Steven
H. Wang, Sam Toyer, Maximilian Ernestus, Nora Belrose, Scott Emmons, Stuart
Russell
- Abstract summary: imitation provides open-source implementations of imitation and reward learning algorithms in PyTorch.
We include three inverse reinforcement learning (IRL) algorithms, three imitation learning algorithms and a preference comparison.
- Score: 7.7064239657103375
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: imitation provides open-source implementations of imitation and reward
learning algorithms in PyTorch. We include three inverse reinforcement learning
(IRL) algorithms, three imitation learning algorithms and a preference
comparison algorithm. The implementations have been benchmarked against
previous results, and automated tests cover 98% of the code. Moreover, the
algorithms are implemented in a modular fashion, making it simple to develop
novel algorithms in the framework. Our source code, including documentation and
examples, is available at https://github.com/HumanCompatibleAI/imitation
Related papers
- Beryllium: Neural Search for Algorithm Implementations [14.11934122454653]
We design a new language named p-language to specify the algorithms and a static analyzer for the p-language to automatically extract information from the algorithm descriptions.
We embedded the output of p-language (p-code) and source code in a common vector space using self-supervised machine learning methods to match algorithm with code without any manual annotation.
Beryllium significantly outperformed the state-of-the-art code search tools in both C and Java.
arXiv Detail & Related papers (2023-05-25T03:49:36Z) - Tree-Based Adaptive Model Learning [62.997667081978825]
We extend the Kearns-Vazirani learning algorithm to handle systems that change over time.
We present a new learning algorithm that can reuse and update previously learned behavior, implement it in the LearnLib library, and evaluate it on large examples.
arXiv Detail & Related papers (2022-08-31T21:24:22Z) - The CLRS Algorithmic Reasoning Benchmark [28.789225199559834]
Learning representations of algorithms is an emerging area of machine learning, seeking to bridge concepts from neural networks with classical algorithms.
We propose the CLRS Algorithmic Reasoning Benchmark, covering classical algorithms from the Introduction to Algorithms textbook.
Our benchmark spans a variety of algorithmic reasoning procedures, including sorting, searching, dynamic programming, graph algorithms, string algorithms and geometric algorithms.
arXiv Detail & Related papers (2022-05-31T09:56:44Z) - A modular software framework for the design and implementation of
ptychography algorithms [55.41644538483948]
We present SciCom, a new ptychography software framework aiming at simulating ptychography datasets and testing state-of-the-art reconstruction algorithms.
Despite its simplicity, the software leverages accelerated processing through the PyTorch interface.
Results are shown on both synthetic and real datasets.
arXiv Detail & Related papers (2022-05-06T16:32:37Z) - Practical, Provably-Correct Interactive Learning in the Realizable
Setting: The Power of True Believers [12.09273192079783]
We consider interactive learning in the realizable setting and develop a general framework to handle problems ranging from best arm identification to active classification.
We design novel computationally efficient algorithms for the realizable setting that match the minimax lower bound up to logarithmic factors.
arXiv Detail & Related papers (2021-11-09T02:33:36Z) - A Pragmatic Look at Deep Imitation Learning [0.3626013617212666]
We re-implement 6 different adversarial imitation learning algorithms.
We evaluate them on a widely-used expert trajectory dataset.
GAIL consistently performs well across a range of sample sizes.
arXiv Detail & Related papers (2021-08-04T06:33:10Z) - The Bayesian Learning Rule [14.141964578853262]
We show that many machine-learning algorithms are specific instances of a single algorithm called the emphBayesian learning rule
The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models.
arXiv Detail & Related papers (2021-07-09T17:28:55Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - A Systematic Characterization of Sampling Algorithms for Open-ended
Language Generation [71.31905141672529]
We study the widely adopted ancestral sampling algorithms for auto-regressive language models.
We identify three key properties that are shared among them: entropy reduction, order preservation, and slope preservation.
We find that the set of sampling algorithms that satisfies these properties performs on par with the existing sampling algorithms.
arXiv Detail & Related papers (2020-09-15T17:28:42Z) - Rethinking Few-Shot Image Classification: a Good Embedding Is All You
Need? [72.00712736992618]
We show that a simple baseline: learning a supervised or self-supervised representation on the meta-training set, outperforms state-of-the-art few-shot learning methods.
An additional boost can be achieved through the use of self-distillation.
We believe that our findings motivate a rethinking of few-shot image classification benchmarks and the associated role of meta-learning algorithms.
arXiv Detail & Related papers (2020-03-25T17:58:42Z) - Model Selection in Contextual Stochastic Bandit Problems [51.94632035240787]
We develop a meta-algorithm that selects between base algorithms.
We show through a lower bound that even when one of the base algorithms has $O(sqrtT)$ regret, in general it is impossible to get better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2020-03-03T18:46:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.