Fighting Copycat Agents in Behavioral Cloning from Observation Histories
- URL: http://arxiv.org/abs/2010.14876v1
- Date: Wed, 28 Oct 2020 10:52:10 GMT
- Title: Fighting Copycat Agents in Behavioral Cloning from Observation Histories
- Authors: Chuan Wen, Jierui Lin, Trevor Darrell, Dinesh Jayaraman, Yang Gao
- Abstract summary: Imitation learning trains policies to map from input observations to the actions that an expert would choose.
We propose an adversarial approach to learn a feature representation that removes excess information about the previous expert action nuisance correlate.
- Score: 85.404120663644
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation learning trains policies to map from input observations to the
actions that an expert would choose. In this setting, distribution shift
frequently exacerbates the effect of misattributing expert actions to nuisance
correlates among the observed variables. We observe that a common instance of
this causal confusion occurs in partially observed settings when expert actions
are strongly correlated over time: the imitator learns to cheat by predicting
the expert's previous action, rather than the next action. To combat this
"copycat problem", we propose an adversarial approach to learn a feature
representation that removes excess information about the previous expert action
nuisance correlate, while retaining the information necessary to predict the
next action. In our experiments, our approach improves performance
significantly across a variety of partially observed imitation learning tasks.
Related papers
- Sim-to-Real Causal Transfer: A Metric Learning Approach to
Causally-Aware Interaction Representations [62.48505112245388]
We take an in-depth look at the causal awareness of modern representations of agent interactions.
We show that recent representations are already partially resilient to perturbations of non-causal agents.
We propose a metric learning approach that regularizes latent representations with causal annotations.
arXiv Detail & Related papers (2023-12-07T18:57:03Z) - Sequence Model Imitation Learning with Unobserved Contexts [39.4969161422156]
We consider imitation learning problems where the expert has access to a per-episode context hidden from the learner.
We show that on-policy approaches are able to use history to identify the context while off-policy approaches actually perform worse when given access to history.
arXiv Detail & Related papers (2022-08-03T17:27:44Z) - Generalizable Information Theoretic Causal Representation [37.54158138447033]
We propose to learn causal representation from observational data by regularizing the learning procedure with mutual information measures according to our hypothetical causal graph.
The optimization involves a counterfactual loss, based on which we deduce a theoretical guarantee that the causality-inspired learning is with reduced sample complexity and better generalization ability.
arXiv Detail & Related papers (2022-02-17T00:38:35Z) - SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event
Data [83.50281440043241]
We study the problem of inferring heterogeneous treatment effects from time-to-event data.
We propose a novel deep learning method for treatment-specific hazard estimation based on balancing representations.
arXiv Detail & Related papers (2021-10-26T20:13:17Z) - ACP++: Action Co-occurrence Priors for Human-Object Interaction
Detection [102.9428507180728]
A common problem in the task of human-object interaction (HOI) detection is that numerous HOI classes have only a small number of labeled examples.
We observe that there exist natural correlations and anti-correlations among human-object interactions.
We present techniques to learn these priors and leverage them for more effective training, especially on rare classes.
arXiv Detail & Related papers (2021-09-09T06:02:50Z) - Feedback in Imitation Learning: Confusion on Causality and Covariate
Shift [12.93527098342393]
We argue that conditioning policies on previous actions leads to a dramatic divergence between "held out" error and performance of the learner in situ.
We analyze existing benchmarks used to test imitation learning approaches.
We find, in a surprising contrast with previous literature, that naive behavioral cloning provides excellent results.
arXiv Detail & Related papers (2021-02-04T20:18:56Z) - Learning "What-if" Explanations for Sequential Decision-Making [92.8311073739295]
Building interpretable parameterizations of real-world decision-making on the basis of demonstrated behavior is essential.
We propose learning explanations of expert decisions by modeling their reward function in terms of preferences with respect to "what if" outcomes.
We highlight the effectiveness of our batch, counterfactual inverse reinforcement learning approach in recovering accurate and interpretable descriptions of behavior.
arXiv Detail & Related papers (2020-07-02T14:24:17Z) - On Evaluating Weakly Supervised Action Segmentation Methods [79.42955857919497]
We focus on two aspects of the use and evaluation of weakly supervised action segmentation approaches.
We train each method on the Breakfast dataset 5 times and provide average and standard deviation of the results.
Our experiments show that the standard deviation over these repetitions is between 1 and 2.5% and significantly affects the comparison between different approaches.
arXiv Detail & Related papers (2020-05-19T20:30:31Z) - D\'ej\`a vu: A Contextualized Temporal Attention Mechanism for
Sequential Recommendation [34.505472771669744]
We argue that the influence from the past events on a user's current action should vary over the course of time and under different context.
We propose a Contextualized Temporal Attention Mechanism that learns to weigh historical actions' influence on not only what action it is, but also when and how the action took place.
arXiv Detail & Related papers (2020-01-29T20:27:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.