Causal Imitation Learning Under Measurement Error and Distribution Shift
- URL: http://arxiv.org/abs/2601.22206v1
- Date: Thu, 29 Jan 2026 18:06:53 GMT
- Title: Causal Imitation Learning Under Measurement Error and Distribution Shift
- Authors: Shi Bo, AmirEmad Ghassami,
- Abstract summary: We study offline imitation learning (IL) when part of the decision-relevant state is observed only through noisy measurements.<n>We propose a general framework for IL under measurement error, inspired by explicitly modeling the causal relationships among the variables.
- Score: 6.038778620145853
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study offline imitation learning (IL) when part of the decision-relevant state is observed only through noisy measurements and the distribution may change between training and deployment. Such settings induce spurious state-action correlations, so standard behavioral cloning (BC) -- whether conditioning on raw measurements or ignoring them -- can converge to systematically biased policies under distribution shift. We propose a general framework for IL under measurement error, inspired by explicitly modeling the causal relationships among the variables, yielding a target that retains a causal interpretation and is robust to distribution shift. Building on ideas from proximal causal inference, we introduce \texttt{CausIL}, which treats noisy state observations as proxy variables, and we provide identification conditions under which the target policy is recoverable from demonstrations without rewards or interactive expert queries. We develop estimators for both discrete and continuous state spaces; for continuous settings, we use an adversarial procedure over RKHS function classes to learn the required parameters. We evaluate \texttt{CausIL} on semi-simulated longitudinal data from the PhysioNet/Computing in Cardiology Challenge 2019 cohort and demonstrate improved robustness to distribution shift compared to BC baselines.
Related papers
- On the Structural Non-Preservation of Epistemic Behaviour under Policy Transformation [51.56484100374058]
We formalise such information-conditioned interaction patterns as behavioural dependency.<n>This induces a probe-relative notion of $$-behavioural equivalence and a within-policy behavioural distance.<n>Results identify structural conditions under which probe-conditioned behavioural separation is not preserved under common policy transformations.
arXiv Detail & Related papers (2026-02-24T22:55:21Z) - Adversary-Free Counterfactual Prediction via Information-Regularized Representations [8.760019957506719]
We study counterfactual prediction under decoder bias and propose a mathematically grounded, information-theoretic approach.<n>We derive a tractable variational objective that upper-bounds the information term and couples it with a supervised assignment, yielding a stable, provably motivated training criterion.<n>We evaluate the method on controlled numerical simulations and a real-world clinical dataset, comparing against recent state-of-the-art balancing, reweighting, and adversarial baselines.
arXiv Detail & Related papers (2025-10-17T09:49:04Z) - Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective.<n>The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning.<n>The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z) - Flexible Nonparametric Inference for Causal Effects under the Front-Door Model [2.6900047294457683]
We develop novel one-step and targeted minimum loss-based estimators for both the average treatment effect and the average treatment effect on the treated under front-door assumptions.<n>Our estimators are built on multiple parameterizations of the observed data distribution, including approaches that avoid mediator density entirely.<n>We show how these constraints can be leveraged to improve the efficiency of causal effect estimators.
arXiv Detail & Related papers (2023-12-15T22:04:53Z) - Distributional Shift-Aware Off-Policy Interval Estimation: A Unified
Error Quantification Framework [8.572441599469597]
We study high-confidence off-policy evaluation in the context of infinite-horizon Markov decision processes.
The objective is to establish a confidence interval (CI) for the target policy value using only offline data pre-collected from unknown behavior policies.
We show that our algorithm is sample-efficient, error-robust, and provably convergent even in non-linear function approximation settings.
arXiv Detail & Related papers (2023-09-23T06:35:44Z) - Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in
Partially Observed Markov Decision Processes [65.91730154730905]
In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors.
Here we tackle this by considering off-policy evaluation in a partially observed Markov decision process (POMDP)
We extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible.
arXiv Detail & Related papers (2021-10-28T17:46:14Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with
Latent Confounders [62.54431888432302]
We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders.
We show how, given only a latent variable model for states and actions, policy value can be identified from off-policy data.
arXiv Detail & Related papers (2020-07-27T22:19:01Z) - GenDICE: Generalized Offline Estimation of Stationary Values [108.17309783125398]
We show that effective estimation can still be achieved in important applications.
Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions.
The resulting algorithm, GenDICE, is straightforward and effective.
arXiv Detail & Related papers (2020-02-21T00:27:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.