Data augmentation for efficient learning from parametric experts
- URL: http://arxiv.org/abs/2205.11448v1
- Date: Mon, 23 May 2022 16:37:16 GMT
- Title: Data augmentation for efficient learning from parametric experts
- Authors: Alexandre Galashov, Josh Merel, Nicolas Heess
- Abstract summary: We focus on what we call the policy cloning setting, in which we use online or offline queries of an expert to inform the behavior of a student policy.
Our approach, augmented policy cloning (APC), uses synthetic states to induce feedback-sensitivity in a region around sampled trajectories.
We achieve highly data-efficient transfer of behavior from an expert to a student policy for high-degrees-of-freedom control problems.
- Score: 88.33380893179697
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a simple, yet powerful data-augmentation technique to enable
data-efficient learning from parametric experts for reinforcement and imitation
learning. We focus on what we call the policy cloning setting, in which we use
online or offline queries of an expert or expert policy to inform the behavior
of a student policy. This setting arises naturally in a number of problems, for
instance as variants of behavior cloning, or as a component of other algorithms
such as DAGGER, policy distillation or KL-regularized RL. Our approach,
augmented policy cloning (APC), uses synthetic states to induce
feedback-sensitivity in a region around sampled trajectories, thus dramatically
reducing the environment interactions required for successful cloning of the
expert. We achieve highly data-efficient transfer of behavior from an expert to
a student policy for high-degrees-of-freedom control problems. We demonstrate
the benefit of our method in the context of several existing and widely used
algorithms that include policy cloning as a constituent part. Moreover, we
highlight the benefits of our approach in two practically relevant settings (a)
expert compression, i.e. transfer to a student with fewer parameters; and (b)
transfer from privileged experts, i.e. where the expert has a different
observation space than the student, usually including access to privileged
information.
Related papers
- Preserving Expert-Level Privacy in Offline Reinforcement Learning [35.486119057117996]
We propose a consensus-based expert-level differentially private offline RL training approach compatible with any existing offline RL algorithm.
We prove rigorous differential privacy guarantees, while maintaining strong empirical performance.
arXiv Detail & Related papers (2024-11-18T21:26:53Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Coherent Soft Imitation Learning [17.345411907902932]
Imitation learning methods seek to learn from an expert either through behavioral cloning (BC) of the policy or inverse reinforcement learning (IRL) of the reward.
This work derives an imitation method that captures the strengths of both BC and IRL.
arXiv Detail & Related papers (2023-05-25T21:54:22Z) - Latent-Variable Advantage-Weighted Policy Optimization for Offline RL [70.01851346635637]
offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions.
In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios.
We propose to leverage latent-variable policies that can represent a broader class of policy distributions.
Our method improves the average performance of the next best-performing offline reinforcement learning methods by 49% on heterogeneous datasets.
arXiv Detail & Related papers (2022-03-16T21:17:03Z) - Curriculum Offline Imitation Learning [72.1015201041391]
offline reinforcement learning tasks require the agent to learn from a pre-collected dataset with no further interactions with the environment.
We propose textitCurriculum Offline Learning (COIL), which utilizes an experience picking strategy for imitating from adaptive neighboring policies with a higher return.
On continuous control benchmarks, we compare COIL against both imitation-based and RL-based methods, showing that it not only avoids just learning a mediocre behavior on mixed datasets but is also even competitive with state-of-the-art offline RL methods.
arXiv Detail & Related papers (2021-11-03T08:02:48Z) - Object-Aware Regularization for Addressing Causal Confusion in Imitation
Learning [131.1852444489217]
This paper presents Object-aware REgularizatiOn (OREO), a technique that regularizes an imitation policy in an object-aware manner.
Our main idea is to encourage a policy to uniformly attend to all semantic objects, in order to prevent the policy from exploiting nuisance variables strongly correlated with expert actions.
arXiv Detail & Related papers (2021-10-27T01:56:23Z) - IQ-Learn: Inverse soft-Q Learning for Imitation [95.06031307730245]
imitation learning from a small amount of expert data can be challenging in high-dimensional environments with complex dynamics.
Behavioral cloning is a simple method that is widely used due to its simplicity of implementation and stable convergence.
We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function.
arXiv Detail & Related papers (2021-06-23T03:43:10Z) - Learning without Knowing: Unobserved Context in Continuous Transfer
Reinforcement Learning [16.814772057210366]
We consider a transfer Reinforcement Learning problem in continuous state and action spaces under unobserved contextual information.
Our goal is to use the context-aware expert data to learn an optimal context-unaware policy for the learner using only a few new data samples.
arXiv Detail & Related papers (2021-06-07T17:49:22Z) - Continuous Action Reinforcement Learning from a Mixture of Interpretable
Experts [35.80418547105711]
We propose a policy scheme that retains a complex function approxor for its internal value predictions but constrains the policy to have a concise, hierarchical, and human-readable structure.
The main technical contribution of the paper is to address the challenges introduced by this non-differentiable state selection procedure.
arXiv Detail & Related papers (2020-06-10T16:02:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.