Related papers: ABC: Adversarial Behavioral Cloning for Offline Mode-Seeking Imitation Learning

ABC: Adversarial Behavioral Cloning for Offline Mode-Seeking Imitation Learning

URL: http://arxiv.org/abs/2211.04005v1
Date: Tue, 8 Nov 2022 04:54:54 GMT
Title: ABC: Adversarial Behavioral Cloning for Offline Mode-Seeking Imitation Learning
Authors: Eddy Hudson and Ishan Durugkar and Garrett Warnell and Peter Stone
Abstract summary: We introduce a modified version of behavioral cloning (BC) that exhibits mode-seeking behavior by incorporating elements of GAN (generative adversarial network) training. We evaluate ABC on toy domains and a domain based on Hopper from the DeepMind Control suite, and show that it outperforms standard BC by being mode-seeking in nature.
Score: 48.033516430071494
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Given a dataset of expert agent interactions with an environment of interest, a viable method to extract an effective agent policy is to estimate the maximum likelihood policy indicated by this data. This approach is commonly referred to as behavioral cloning (BC). In this work, we describe a key disadvantage of BC that arises due to the maximum likelihood objective function; namely that BC is mean-seeking with respect to the state-conditional expert action distribution when the learner's policy is represented with a Gaussian. To address this issue, we introduce a modified version of BC, Adversarial Behavioral Cloning (ABC), that exhibits mode-seeking behavior by incorporating elements of GAN (generative adversarial network) training. We evaluate ABC on toy domains and a domain based on Hopper from the DeepMind Control suite, and show that it outperforms standard BC by being mode-seeking in nature.

Related papers

Pragmatic Policy Development via Interpretable Behavior Cloning [6.177449809243359]
We propose deriving treatment policies from the most frequently chosen actions in each patient state, as estimated by an interpretable model of the behavior policy.<n>We demonstrate that policies derived under this framework can outperform current practice, offering interpretable alternatives to those obtained via offline RL.
arXiv Detail & Related papers (2025-07-22T22:34:35Z)
Robust Behavior Cloning Via Global Lipschitz Regularization [0.5767156832161817]
Behavior Cloning is an effective imitation learning technique and has even been adopted in some safety-critical domains such as autonomous vehicles.<n>We use a global Lipschitz regularization approach to enhance the robustness of the learned policy network.<n>We propose a way to construct a Lipschitz neural network that ensures the policy robustness.
arXiv Detail & Related papers (2025-06-24T02:19:08Z)
From Imitation to Refinement -- Residual RL for Precise Assembly [19.9786629249219]
Behavior cloning (BC) has enabled impressive capabilities, but imitation is insufficient for learning reliable policies for tasks requiring precise aligning and inserting of objects, like assembly. We present ResiP (Residual for Precise Manipulation), that sidesteps these challenges by augmenting a frozen, chunked BC model with a fully closed-loop residual policy trained with RL. Evaluation on high-precision manipulation tasks demonstrates strong performance of ResiP over BC methods and direct RL fine-tuning.
arXiv Detail & Related papers (2024-07-23T17:44:54Z)
ADR-BC: Adversarial Density Weighted Regression Behavior Cloning [29.095342729527733]
Imitation Learning (IL) methods first shape a reward or Q function and then use this shaped function within a reinforcement learning framework to optimize the empirical policy. We propose ADR-BC, which aims to enhance behavior cloning through augmented density-based action support. As a one-step behavior cloning framework, ADR-BC avoids the cumulative bias associated with multi-step RL frameworks.
arXiv Detail & Related papers (2024-05-28T06:59:16Z)
Coherent Soft Imitation Learning [17.345411907902932]
Imitation learning methods seek to learn from an expert either through behavioral cloning (BC) of the policy or inverse reinforcement learning (IRL) of the reward. This work derives an imitation method that captures the strengths of both BC and IRL.
arXiv Detail & Related papers (2023-05-25T21:54:22Z)
Learnable Behavior Control: Breaking Atari Human World Records via Sample-Efficient Behavior Selection [56.87650511573298]
We propose a general framework called Learnable Behavioral Control (LBC) to address the limitation. Our agents have achieved 10077.52% mean human normalized score and surpassed 24 human world records within 1B training frames.
arXiv Detail & Related papers (2023-05-09T08:00:23Z)
Tight Performance Guarantees of Imitator Policies with Continuous Actions [45.3190496371625]
We provide theoretical guarantees on the performance of the imitator policy in the case of continuous actions. We analyze noise injection, a common practice in which the expert action is executed in the environment after the application of a noise kernel.
arXiv Detail & Related papers (2022-12-07T19:32:11Z)
TD3 with Reverse KL Regularizer for Offline Reinforcement Learning from Mixed Datasets [118.22975463000928]
We consider an offline reinforcement learning (RL) setting where the agent need to learn from a dataset collected by rolling out multiple behavior policies. There are two challenges for this setting: 1) The optimal trade-off between optimizing the RL signal and the behavior cloning (BC) signal changes on different states due to the variation of the action coverage induced by different behavior policies. In this paper, we address both challenges by using adaptively weighted reverse Kullback-Leibler (KL) divergence as the BC regularizer based on the TD3 algorithm.
arXiv Detail & Related papers (2022-12-05T09:36:23Z)
Improving TD3-BC: Relaxed Policy Constraint for Offline Learning and Stable Online Fine-Tuning [7.462336024223669]
Key challenge is overcoming overestimation bias for actions not present in data. One simple method to reduce this bias is to introduce a policy constraint via behavioural cloning (BC) We demonstrate that by continuing to train a policy offline while reducing the influence of the BC component we can produce refined policies.
arXiv Detail & Related papers (2022-11-21T19:10:27Z)
Collapse by Conditioning: Training Class-conditional GANs with Limited Data [109.30895503994687]
We propose a training strategy for conditional GANs (cGANs) that effectively prevents the observed mode-collapse by leveraging unconditional learning. Our training strategy starts with an unconditional GAN and gradually injects conditional information into the generator and the objective function. The proposed method for training cGANs with limited data results not only in stable training but also in generating high-quality images.
arXiv Detail & Related papers (2022-01-17T18:59:23Z)
Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning [131.1852444489217]
This paper presents Object-aware REgularizatiOn (OREO), a technique that regularizes an imitation policy in an object-aware manner. Our main idea is to encourage a policy to uniformly attend to all semantic objects, in order to prevent the policy from exploiting nuisance variables strongly correlated with expert actions.
arXiv Detail & Related papers (2021-10-27T01:56:23Z)
Offline Reinforcement Learning with Implicit Q-Learning [85.62618088890787]
Current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy. We propose an offline RL method that never needs to evaluate actions outside of the dataset. This method enables the learned policy to improve substantially over the best behavior in the data through generalization.
arXiv Detail & Related papers (2021-10-12T17:05:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.