Of Moments and Matching: Trade-offs and Treatments in Imitation Learning
- URL: http://arxiv.org/abs/2103.03236v1
- Date: Thu, 4 Mar 2021 18:57:11 GMT
- Title: Of Moments and Matching: Trade-offs and Treatments in Imitation Learning
- Authors: Gokul Swamy, Sanjiban Choudhury, Zhiwei Steven Wu, J. Andrew Bagnell
- Abstract summary: We provide a unifying view of a large family of previous imitation learning algorithms through the lens of moment matching.
By considering adversarially chosen divergences between learner and expert behavior, we are able to derive bounds on policy performance.
We derive two novel algorithm templates, AdVIL and AdRIL, with strong guarantees, simple implementation, and competitive empirical performance.
- Score: 26.121994149869767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide a unifying view of a large family of previous imitation learning
algorithms through the lens of moment matching. At its core, our classification
scheme is based on whether the learner attempts to match (1) reward or (2)
action-value moments of the expert's behavior, with each option leading to
differing algorithmic approaches. By considering adversarially chosen
divergences between learner and expert behavior, we are able to derive bounds
on policy performance that apply for all algorithms in each of these classes,
the first to our knowledge. We also introduce the notion of recoverability,
implicit in many previous analyses of imitation learning, which allows us to
cleanly delineate how well each algorithmic family is able to mitigate
compounding errors. We derive two novel algorithm templates, AdVIL and AdRIL,
with strong guarantees, simple implementation, and competitive empirical
performance.
Related papers
- Two-stage Learning-to-Defer for Multi-Task Learning [3.4289478404209826]
We introduce a Learning-to-Defer approach for multi-task learning that encompasses both classification and regression tasks.
Our two-stage approach utilizes a rejector that defers decisions to the most accurate agent among a pre-trained joint-regressor models and one or more external experts.
arXiv Detail & Related papers (2024-10-21T07:44:57Z) - Bounds on the Generalization Error in Active Learning [0.0]
We establish empirical risk principles for active learning by deriving a family of upper bounds on the generalization error.
We systematically link diverse active learning scenarios, characterized by their loss functions and hypothesis classes to their corresponding upper bounds.
Our results show that regularization techniques used to constraint the complexity of various hypothesis classes are sufficient conditions to ensure the validity of the bounds.
arXiv Detail & Related papers (2024-09-10T08:08:09Z) - Learning-to-Optimize with PAC-Bayesian Guarantees: Theoretical Considerations and Practical Implementation [4.239829789304117]
We use the PAC-Bayesian theory for the setting of learning-to-optimize.
We present the first framework to learn optimization algorithms with provable generalization guarantees.
Our learned algorithms provably outperform related ones derived from a (deterministic) worst-case analysis.
arXiv Detail & Related papers (2024-04-04T08:24:57Z) - Predictor-Rejector Multi-Class Abstention: Theoretical Analysis and Algorithms [30.389055604165222]
We study the key framework of learning with abstention in the multi-class classification setting.
In this setting, the learner can choose to abstain from making a prediction with some pre-defined cost.
We introduce several new families of surrogate losses for which we prove strong non-asymptotic and hypothesis set-specific consistency guarantees.
arXiv Detail & Related papers (2023-10-23T10:16:27Z) - Multivariate Systemic Risk Measures and Computation by Deep Learning
Algorithms [63.03966552670014]
We discuss the key related theoretical aspects, with a particular focus on the fairness properties of primal optima and associated risk allocations.
The algorithms we provide allow for learning primals, optima for the dual representation and corresponding fair risk allocations.
arXiv Detail & Related papers (2023-02-02T22:16:49Z) - Synergies between Disentanglement and Sparsity: Generalization and
Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations.
Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z) - Towards Diverse Evaluation of Class Incremental Learning: A Representation Learning Perspective [67.45111837188685]
Class incremental learning (CIL) algorithms aim to continually learn new object classes from incrementally arriving data.
We experimentally analyze neural network models trained by CIL algorithms using various evaluation protocols in representation learning.
arXiv Detail & Related papers (2022-06-16T11:44:11Z) - Identifying Co-Adaptation of Algorithmic and Implementational
Innovations in Deep Reinforcement Learning: A Taxonomy and Case Study of
Inference-based Algorithms [15.338931971492288]
We focus on a series of inference-based actor-critic algorithms to decouple their algorithmic innovations and implementation decisions.
We identify substantial performance drops whenever implementation details are mismatched for algorithmic choices.
Results show which implementation details are co-adapted and co-evolved with algorithms.
arXiv Detail & Related papers (2021-03-31T17:55:20Z) - Instance-Dependent Complexity of Contextual Bandits and Reinforcement
Learning: A Disagreement-Based Perspective [104.67295710363679]
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm.
We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds.
We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case.
arXiv Detail & Related papers (2020-10-07T01:33:06Z) - A black-box adversarial attack for poisoning clustering [78.19784577498031]
We propose a black-box adversarial attack for crafting adversarial samples to test the robustness of clustering algorithms.
We show that our attacks are transferable even against supervised algorithms such as SVMs, random forests, and neural networks.
arXiv Detail & Related papers (2020-09-09T18:19:31Z) - Tighter Generalization Bounds for Iterative Differentially Private
Learning Algorithms [95.73230376153872]
This paper studies the relationship between generalization and privacy preservation in iterative learning algorithms by two sequential steps.
We prove that $(varepsilon, delta)$-differential privacy implies an on-average generalization bound for multi-Database learning algorithms.
We then investigate how the iterative nature shared by most learning algorithms influence privacy preservation and further generalization.
arXiv Detail & Related papers (2020-07-18T09:12:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.