Collaborative causal inference on distributed data
- URL: http://arxiv.org/abs/2208.07898v5
- Date: Fri, 12 Jan 2024 04:38:19 GMT
- Title: Collaborative causal inference on distributed data
- Authors: Yuji Kawamata, Ryoki Motai, Yukihiko Okada, Akira Imakura, Tetsuya
Sakurai
- Abstract summary: We propose a data collaboration quasi-experiment (DC-QE) that resolves the lack of both subjects and covariates, reducing random errors and biases in the estimation.
Our method involves constructing dimensionality-reduced intermediate representations from private data from local parties, sharing intermediate representations instead of private data for privacy preservation, estimating propensity scores from the shared intermediate representations, and finally, estimating the treatment effects from propensity scores.
- Score: 7.293479909193382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, the development of technologies for causal inference with
privacy preservation of distributed data has gained considerable attention.
Many existing methods for distributed data focus on resolving the lack of
subjects (samples) and can only reduce random errors in estimating treatment
effects. In this study, we propose a data collaboration quasi-experiment
(DC-QE) that resolves the lack of both subjects and covariates, reducing random
errors and biases in the estimation. Our method involves constructing
dimensionality-reduced intermediate representations from private data from
local parties, sharing intermediate representations instead of private data for
privacy preservation, estimating propensity scores from the shared intermediate
representations, and finally, estimating the treatment effects from propensity
scores. Through numerical experiments on both artificial and real-world data,
we confirm that our method leads to better estimation results than individual
analyses. While dimensionality reduction loses some information in the private
data and causes performance degradation, we observe that sharing intermediate
representations with many parties to resolve the lack of subjects and
covariates sufficiently improves performance to overcome the degradation caused
by dimensionality reduction. Although external validity is not necessarily
guaranteed, our results suggest that DC-QE is a promising method. With the
widespread use of our method, intermediate representations can be published as
open data to help researchers find causalities and accumulate a knowledge base.
Related papers
- Source-Free Domain-Invariant Performance Prediction [68.39031800809553]
We propose a source-free approach centred on uncertainty-based estimation, using a generative model for calibration in the absence of source data.
Our experiments on benchmark object recognition datasets reveal that existing source-based methods fall short with limited source sample availability.
Our approach significantly outperforms the current state-of-the-art source-free and source-based methods, affirming its effectiveness in domain-invariant performance estimation.
arXiv Detail & Related papers (2024-08-05T03:18:58Z) - Multi-Source Conformal Inference Under Distribution Shift [41.701790856201036]
We consider the problem of obtaining distribution-free prediction intervals for a target population, leveraging multiple potentially biased data sources.
We derive the efficient influence functions for the quantiles of unobserved outcomes in the target and source populations.
We propose a data-adaptive strategy to upweight informative data sources for efficiency gain and downweight non-informative data sources for bias reduction.
arXiv Detail & Related papers (2024-05-15T13:33:09Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Multiply Robust Federated Estimation of Targeted Average Treatment
Effects [0.0]
We propose a novel approach to derive valid causal inferences for a target population using multi-site data.
Our methodology incorporates transfer learning to estimate ensemble weights to combine information from source sites.
arXiv Detail & Related papers (2023-09-22T03:15:08Z) - CEDAR: Communication Efficient Distributed Analysis for Regressions [9.50726756006467]
There are growing interests about distributed learning over multiple EHRs databases without sharing patient-level data.
We propose a novel communication efficient method that aggregates the local optimal estimates, by turning the problem into a missing data problem.
We provide theoretical investigation for the properties of the proposed method for statistical inference as well as differential privacy, and evaluate its performance in simulations and real data analyses.
arXiv Detail & Related papers (2022-07-01T09:53:44Z) - Learning Bias-Invariant Representation by Cross-Sample Mutual
Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task.
The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator.
We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z) - Federated Estimation of Causal Effects from Observational Data [19.657789891394504]
We present a novel framework for causal inference with federated data sources.
We assess and integrate local causal effects from different private data sources without centralizing them.
arXiv Detail & Related papers (2021-05-31T08:06:00Z) - Graph-Homomorphic Perturbations for Private Decentralized Learning [64.26238893241322]
Local exchange of estimates allows inference of data based on private data.
perturbations chosen independently at every agent, resulting in a significant performance loss.
We propose an alternative scheme, which constructs perturbations according to a particular nullspace condition, allowing them to be invisible.
arXiv Detail & Related papers (2020-10-23T10:35:35Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation.
We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z) - A Critical Overview of Privacy-Preserving Approaches for Collaborative
Forecasting [0.0]
Cooperation between different data owners may lead to an improvement in forecast quality.
Due to business competitive factors and personal data protection questions, said data owners might be unwilling to share their data.
This paper analyses the state-of-the-art and unveils several shortcomings of existing methods in guaranteeing data privacy.
arXiv Detail & Related papers (2020-04-20T20:21:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.