Confounded Causal Imitation Learning with Instrumental Variables
- URL: http://arxiv.org/abs/2507.17309v1
- Date: Wed, 23 Jul 2025 08:23:34 GMT
- Title: Confounded Causal Imitation Learning with Instrumental Variables
- Authors: Yan Zeng, Shenglan Nie, Feng Xie, Libo Huang, Peng Wu, Zhi Geng,
- Abstract summary: Imitation learning from demonstrations usually suffers from the confounding effects of unmeasured variables.<n>We develop a two-stage imitation learning framework for valid IV identification and policy optimization.
- Score: 16.070797736247425
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imitation learning from demonstrations usually suffers from the confounding effects of unmeasured variables (i.e., unmeasured confounders) on the states and actions. If ignoring them, a biased estimation of the policy would be entailed. To break up this confounding gap, in this paper, we take the best of the strong power of instrumental variables (IV) and propose a Confounded Causal Imitation Learning (C2L) model. This model accommodates confounders that influence actions across multiple timesteps, rather than being restricted to immediate temporal dependencies. We develop a two-stage imitation learning framework for valid IV identification and policy optimization. In particular, in the first stage, we construct a testing criterion based on the defined pseudo-variable, with which we achieve identifying a valid IV for the C2L models. Such a criterion entails the sufficient and necessary identifiability conditions for IV validity. In the second stage, with the identified IV, we propose two candidate policy learning approaches: one is based on a simulator, while the other is offline. Extensive experiments verified the effectiveness of identifying the valid IV as well as learning the policy.
Related papers
- Flow IV: Counterfactual Inference In Nonseparable Outcome Models Using Instrumental Variables [2.3213238782019316]
We show that under standard IV assumptions, along with the assumptions that latent noises in treatment and outcome are strictly monotonic and jointly Gaussian, the treatment-outcome relationship becomes uniquely identifiable from observed data.<n>This enables counterfactual inference even in nonseparable models.<n>We implement our approach by training a normalizing flow to maximize the likelihood of the observed data, demonstrating accurate recovery of the underlying outcome function.
arXiv Detail & Related papers (2025-08-02T11:24:03Z) - Disentangled Representation Learning for Causal Inference with Instruments [31.67220687652054]
Existing IV based estimators need a known IV or other strong assumptions, such as the existence of two or more IVs in the system.<n>In this paper, we consider a relaxed requirement, which assumes there is an IV proxy in the system without knowing which variable is the proxy.<n>We propose a Variational AutoEncoder (VAE) based disentangled representation learning method to learn an IV representation from a dataset with latent confounders.
arXiv Detail & Related papers (2024-12-05T22:18:48Z) - Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling [51.38330727868982]
We show how action chunking impacts the divergence between a learner and a demonstrator.<n>We propose Bidirectional Decoding (BID), a test-time inference algorithm that bridges action chunking with closed-loop adaptation.<n>Our method boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z) - Learning Decision Policies with Instrumental Variables through Double Machine Learning [16.842233444365764]
A common issue in learning decision-making policies in data-rich settings is spurious correlations in the offline dataset.
We propose DML-IV, a non-linear IV regression method that reduces the bias in two-stage IV regressions.
It outperforms state-of-the-art IV regression methods on IV regression benchmarks and learns high-performing policies in the presence of instruments.
arXiv Detail & Related papers (2024-05-14T10:55:04Z) - Regularized DeepIV with Model Selection [72.17508967124081]
Regularized DeepIV (RDIV) regression can converge to the least-norm IV solution.
Our method matches the current state-of-the-art convergence rate.
arXiv Detail & Related papers (2024-03-07T05:38:56Z) - Conformal Off-Policy Evaluation in Markov Decision Processes [53.786439742572995]
Reinforcement Learning aims at identifying and evaluating efficient control policies from data.
Most methods for this learning task, referred to as Off-Policy Evaluation (OPE), do not come with accuracy and certainty guarantees.
We present a novel OPE method based on Conformal Prediction that outputs an interval containing the true reward of the target policy with a prescribed level of certainty.
arXiv Detail & Related papers (2023-04-05T16:45:11Z) - Confounder Balancing for Instrumental Variable Regression with Latent
Variable [29.288045682505615]
This paper studies the confounding effects from the unmeasured confounders and the imbalance of observed confounders in IV regression.
We propose a Confounder Balanced IV Regression (CB-IV) algorithm to remove the bias from the unmeasured confounders and the imbalance of observed confounders.
arXiv Detail & Related papers (2022-11-18T03:13:53Z) - On the instrumental variable estimation with many weak and invalid
instruments [1.837552179215311]
We discuss the fundamental issue of computation in instrumental variable (IV) models with unknown IV validity.
With the assumption of the "sparsest properties", which is is equivalent to a sparse penalty structure, we investigate and prove the advantages of a surrogate-step identification method.
We propose a surrogate-step selection estimation method that aligns with the sparse identification condition.
arXiv Detail & Related papers (2022-07-07T01:31:34Z) - Deterministic and Discriminative Imitation (D2-Imitation): Revisiting
Adversarial Imitation for Sample Efficiency [61.03922379081648]
We propose an off-policy sample efficient approach that requires no adversarial training or min-max optimization.
Our empirical results show that D2-Imitation is effective in achieving good sample efficiency, outperforming several off-policy extension approaches of adversarial imitation.
arXiv Detail & Related papers (2021-12-11T19:36:19Z) - Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning [107.70165026669308]
In offline reinforcement learning (RL) an optimal policy is learned solely from a priori collected observational data.
We study a confounded Markov decision process where the transition dynamics admit an additive nonlinear functional form.
We propose a provably efficient IV-aided Value Iteration (IVVI) algorithm based on a primal-dual reformulation of the conditional moment restriction.
arXiv Detail & Related papers (2021-02-19T13:01:40Z) - Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with
Latent Confounders [62.54431888432302]
We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders.
We show how, given only a latent variable model for states and actions, policy value can be identified from off-policy data.
arXiv Detail & Related papers (2020-07-27T22:19:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.