Robust Direct Learning for Causal Data Fusion
- URL: http://arxiv.org/abs/2211.00249v1
- Date: Tue, 1 Nov 2022 03:33:22 GMT
- Title: Robust Direct Learning for Causal Data Fusion
- Authors: Xinyu Li, Yilin Li, Qing Cui, Longfei Li, Jun Zhou
- Abstract summary: We provide a framework for integrating multi-source data that separates the treatment effect from other nuisance functions.
We also propose a causal information-aware weighting function motivated by theoretical insights from the semiparametric efficiency theory.
- Score: 14.462235940634969
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the era of big data, the explosive growth of multi-source heterogeneous
data offers many exciting challenges and opportunities for improving the
inference of conditional average treatment effects. In this paper, we
investigate homogeneous and heterogeneous causal data fusion problems under a
general setting that allows for the presence of source-specific covariates. We
provide a direct learning framework for integrating multi-source data that
separates the treatment effect from other nuisance functions, and achieves
double robustness against certain misspecification. To improve estimation
precision and stability, we propose a causal information-aware weighting
function motivated by theoretical insights from the semiparametric efficiency
theory; it assigns larger weights to samples containing more causal information
with high interpretability. We introduce a two-step algorithm, the weighted
multi-source direct learner, based on constructing a pseudo-outcome and
regressing it on covariates under a weighted least square criterion; it offers
us a powerful tool for causal data fusion, enjoying the advantages of easy
implementation, double robustness and model flexibility. In simulation studies,
we demonstrate the effectiveness of our proposed methods in both homogeneous
and heterogeneous causal data fusion scenarios.
Related papers
- Multi-Source Conformal Inference Under Distribution Shift [41.701790856201036]
We consider the problem of obtaining distribution-free prediction intervals for a target population, leveraging multiple potentially biased data sources.
We derive the efficient influence functions for the quantiles of unobserved outcomes in the target and source populations.
We propose a data-adaptive strategy to upweight informative data sources for efficiency gain and downweight non-informative data sources for bias reduction.
arXiv Detail & Related papers (2024-05-15T13:33:09Z) - Interpretable Multi-Source Data Fusion Through Latent Variable Gaussian Process [8.207427766052044]
The proposed approach is demonstrated on and analyzed through two mathematical (representative parabola problem, 2D Ackley function) and two materials science (design of FeCrAl and SmCoFe alloys) case studies.
It is observed that compared to using single-source and source unaware ML models, the proposed multi-source data fusion framework can provide better predictions for sparse-data problems, interpretability regarding the sources, and enhanced modeling capabilities by taking advantage of the correlations and relationships among different sources.
arXiv Detail & Related papers (2024-02-06T16:54:59Z) - The Risk of Federated Learning to Skew Fine-Tuning Features and
Underperform Out-of-Distribution Robustness [50.52507648690234]
Federated learning has the risk of skewing fine-tuning features and compromising the robustness of the model.
We introduce three robustness indicators and conduct experiments across diverse robust datasets.
Our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-01-25T09:18:51Z) - DRFLM: Distributionally Robust Federated Learning with Inter-client
Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data.
We propose a general framework to solve the above two challenges simultaneously.
We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z) - Distributionally Robust Learning [11.916893752969429]
This book develops a comprehensive statistical learning framework that is robust to (distributional) perturbations in the data.
A tractable DRO relaxation for each problem is being derived, establishing a connection between bounds and regularization.
Beyond theory, we include numerical experiments and case studies using synthetic and real data.
arXiv Detail & Related papers (2021-08-20T04:14:18Z) - BayesIMP: Uncertainty Quantification for Causal Data Fusion [52.184885680729224]
We study the causal data fusion problem, where datasets pertaining to multiple causal graphs are combined to estimate the average treatment effect of a target variable.
We introduce a framework which combines ideas from probabilistic integration and kernel mean embeddings to represent interventional distributions in the reproducing kernel Hilbert space.
arXiv Detail & Related papers (2021-06-07T10:14:18Z) - Robust Bayesian Inference for Discrete Outcomes with the Total Variation
Distance [5.139874302398955]
Models of discrete-valued outcomes are easily misspecified if the data exhibit zero-inflation, overdispersion or contamination.
Here, we introduce a robust discrepancy-based Bayesian approach using the Total Variation Distance (TVD)
We empirically demonstrate that our approach is robust and significantly improves predictive performance on a range of simulated and real world data.
arXiv Detail & Related papers (2020-10-26T09:53:06Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.