Initial State Interventions for Deconfounded Imitation Learning
- URL: http://arxiv.org/abs/2307.15980v3
- Date: Fri, 11 Aug 2023 04:32:04 GMT
- Title: Initial State Interventions for Deconfounded Imitation Learning
- Authors: Samuel Pfrommer, Yatong Bai, Hyunin Lee, Somayeh Sojoudi
- Abstract summary: We consider the problem of masking observed confounders in a disentangled representation of the observation space.
Our novel masking algorithm leverages the usual ability to intervene in the initial system state.
Under certain assumptions, we theoretically prove that this algorithm is conservative in the sense that it does not incorrectly mask observations that causally influence the expert.
- Score: 11.605936648692543
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation learning suffers from causal confusion. This phenomenon occurs when
learned policies attend to features that do not causally influence the expert
actions but are instead spuriously correlated. Causally confused agents produce
low open-loop supervised loss but poor closed-loop performance upon deployment.
We consider the problem of masking observed confounders in a disentangled
representation of the observation space. Our novel masking algorithm leverages
the usual ability to intervene in the initial system state, avoiding any
requirement involving expert querying, expert reward functions, or causal graph
specification. Under certain assumptions, we theoretically prove that this
algorithm is conservative in the sense that it does not incorrectly mask
observations that causally influence the expert; furthermore, intervening on
the initial state serves to strictly reduce excess conservatism. The masking
algorithm is applied to behavior cloning for two illustrative control systems:
CartPole and Reacher.
Related papers
- Rethinking State Disentanglement in Causal Reinforcement Learning [78.12976579620165]
Causality provides rigorous theoretical support for ensuring that the underlying states can be uniquely recovered through identifiability.
We revisit this research line and find that incorporating RL-specific context can reduce unnecessary assumptions in previous identifiability analyses for latent states.
We propose a novel approach for general partially observable Markov Decision Processes (POMDPs) by replacing the complicated structural constraints in previous methods with two simple constraints for transition and reward preservation.
arXiv Detail & Related papers (2024-08-24T06:49:13Z) - Guiding the generation of counterfactual explanations through temporal background knowledge for Predictive Process Monitoring [13.610101763172452]
We adapt state-of-the-art techniques for counterfactual generation in the domain of XAI to consider a series of temporal constraints at runtime.
We showcase that the inclusion of temporal background knowledge allows the generation of counterfactuals more conformant to the temporal background knowledge.
arXiv Detail & Related papers (2024-03-18T10:34:40Z) - Neglected Hessian component explains mysteries in Sharpness
regularization [19.882170571967368]
We show that differences can be explained by the structure of the Hessian of the loss.
We find that regularizing feature exploitation but not feature exploration yields performance similar to gradient penalties.
arXiv Detail & Related papers (2024-01-19T16:52:53Z) - Offline Imitation Learning by Controlling the Effective Planning Horizon [5.844892266835562]
We investigate the effect of controlling the effective planning horizon as opposed to imposing an explicit regularizer.
We show that the corrected algorithm improves on popular imitation learning benchmarks by controlling the effective planning horizon rather than an explicit regularization.
arXiv Detail & Related papers (2024-01-18T05:17:30Z) - Efficient Reinforcement Learning with Impaired Observability: Learning
to Act with Delayed and Missing State Observations [92.25604137490168]
This paper introduces a theoretical investigation into efficient reinforcement learning in control systems.
We present algorithms and establish near-optimal regret upper and lower bounds, of the form $tildemathcalO(sqrtrm poly(H) SAK)$, for RL in the delayed and missing observation settings.
arXiv Detail & Related papers (2023-06-02T02:46:39Z) - Causal Discovery from Subsampled Time Series with Proxy Variables [19.699813624529813]
In this paper, we propose a constraint-based algorithm that can identify the entire causal structure from subsampled time series.
Our algorithm is nonparametric and can achieve full causal identification.
arXiv Detail & Related papers (2023-05-09T08:58:02Z) - Bandit Social Learning: Exploration under Myopic Behavior [58.75758600464338]
We study social learning dynamics motivated by reviews on online platforms.
Agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration.
We derive stark learning failures for any such behavior, and provide matching positive results.
arXiv Detail & Related papers (2023-02-15T01:57:57Z) - Nested Counterfactual Identification from Arbitrary Surrogate
Experiments [95.48089725859298]
We study the identification of nested counterfactuals from an arbitrary combination of observations and experiments.
Specifically, we prove the counterfactual unnesting theorem (CUT), which allows one to map arbitrary nested counterfactuals to unnested ones.
arXiv Detail & Related papers (2021-07-07T12:51:04Z) - Fighting Copycat Agents in Behavioral Cloning from Observation Histories [85.404120663644]
Imitation learning trains policies to map from input observations to the actions that an expert would choose.
We propose an adversarial approach to learn a feature representation that removes excess information about the previous expert action nuisance correlate.
arXiv Detail & Related papers (2020-10-28T10:52:10Z) - Excursion Search for Constrained Bayesian Optimization under a Limited
Budget of Failures [62.41541049302712]
We propose a novel decision maker grounded in control theory that controls the amount of risk we allow in the search as a function of a given budget of failures.
Our algorithm uses the failures budget more efficiently in a variety of optimization experiments, and generally achieves lower regret, than state-of-the-art methods.
arXiv Detail & Related papers (2020-05-15T09:54:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.