Multi-Modal Causal Inference with Deep Structural Equation Models
- URL: http://arxiv.org/abs/2203.09672v2
- Date: Mon, 21 Mar 2022 00:49:30 GMT
- Title: Multi-Modal Causal Inference with Deep Structural Equation Models
- Authors: Shachi Deshpande, Zheng Li, Volodymyr Kuleshov (Department of Computer
Science, Cornell Tech)
- Abstract summary: We develop techniques that leverage unstructured data within causal inference to correct for confounders that may otherwise not be accounted for.
We empirically demonstrate on tasks in genomics and healthcare that unstructured data can be used to correct for diverse sources of confounding.
- Score: 3.5271614282612314
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accounting for the effects of confounders is one of the central challenges in
causal inference. Unstructured multi-modal data (images, time series, text)
contains valuable information about diverse types of confounders, yet it is
typically left unused by most existing methods. This paper seeks to develop
techniques that leverage this unstructured data within causal inference to
correct for additional confounders that may otherwise not be accounted for. We
formalize this task and we propose algorithms based on deep structural
equations that treat multi-modal unstructured data as proxy variables. We
empirically demonstrate on tasks in genomics and healthcare that unstructured
data can be used to correct for diverse sources of confounding, potentially
enabling the use of large amounts of data that were previously not used in
causal inference.
Related papers
- Multimodal Tabular Reasoning with Privileged Structured Information [67.40011423365712]
We introduce TabUlar Reasoning with Bridged infOrmation (sc Turbo)<n>sc Turbo benefits from a structure-aware reasoning trace generator based on DeepSeek-R1.<n>sc Turbo achieves state-of-the-art performance ($+7.2%$ vs. previous SOTA) across multiple datasets.
arXiv Detail & Related papers (2025-06-04T15:46:30Z) - A Unifying Framework for Robust and Efficient Inference with Unstructured Data [2.07180164747172]
This paper presents a general framework for conducting efficient and robust inference on parameters derived from unstructured data.
We formalize this approach with MARS (Missing At Random Structured Data), a unifying framework that integrates and extends existing methods for debiased inference.
We develop robust and efficient estimators for both descriptive and causal estimands and address challenges such as inference using aggregated and transformed predictions from unstructured data.
arXiv Detail & Related papers (2025-05-01T04:11:25Z) - Structural Entropy Guided Probabilistic Coding [52.01765333755793]
We propose a novel structural entropy-guided probabilistic coding model, named SEPC.
We incorporate the relationship between latent variables into the optimization by proposing a structural entropy regularization loss.
Experimental results across 12 natural language understanding tasks, including both classification and regression tasks, demonstrate the superior performance of SEPC.
arXiv Detail & Related papers (2024-12-12T00:37:53Z) - Scalable Representation Learning for Multimodal Tabular Transactions [14.18267117657451]
We present an innovative and scalable solution to these challenges.
We propose a parameter efficient decoder that interleaves transaction and text modalities.
We validate the efficacy of our solution on a large-scale dataset of synthetic payments transactions.
arXiv Detail & Related papers (2024-10-10T12:18:42Z) - Standardizing Structural Causal Models [80.21199731817698]
We propose internally-standardized structural causal models (iSCMs) for benchmarking algorithms.
By construction, iSCMs are not $operatornameVar$-sortable, and as we show experimentally, not $operatornameR2$-sortable either for commonly-used graph families.
arXiv Detail & Related papers (2024-06-17T14:52:21Z) - Towards Causal Relationship in Indefinite Data: Baseline Model and New
Datasets [23.035761299444953]
"Indefinite Data" is characterized by multi-structure data and multi-value representations.
We release two high-quality datasets - Causalogue and Causaction.
We propose a probabilistic framework as a baseline, incorporating three designed highlights for this gap.
arXiv Detail & Related papers (2024-01-16T09:15:43Z) - Modular Learning of Deep Causal Generative Models for High-dimensional Causal Inference [5.522612010562183]
Modular-DCM is the first algorithm that, given the causal structure, uses adversarial training to learn the network weights.
We show our algorithm's convergence on the COVIDx dataset and its utility with a causal invariant prediction problem on CelebA-HQ.
arXiv Detail & Related papers (2024-01-02T20:31:15Z) - Solving Data Quality Problems with Desbordante: a Demo [35.75243108496634]
Desbordante is an open-source data profiler that aims to close this gap.
It is built with emphasis on industrial application: it is efficient, scalable, resilient to crashes, and provides explanations.
In this demonstration, we show several scenarios that allow end users to solve different data quality problems.
arXiv Detail & Related papers (2023-07-27T15:26:26Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious.
We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z) - End-to-End Training of CNN Ensembles for Person Re-Identification [0.0]
We propose an end-to-end ensemble method for person re-identification (ReID) to address the problem of overfitting in discriminative models.
Our proposed ensemble learning framework produces several diverse and accurate base learners in a single DenseNet.
Experiments on several benchmark datasets demonstrate that our method achieves state-of-the-art results.
arXiv Detail & Related papers (2020-10-03T12:40:13Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z) - Relating by Contrasting: A Data-efficient Framework for Multimodal
Generative Models [86.9292779620645]
We develop a contrastive framework for generative model learning, allowing us to train the model not just by the commonality between modalities, but by the distinction between "related" and "unrelated" multimodal data.
Under our proposed framework, the generative model can accurately identify related samples from unrelated ones, making it possible to make use of the plentiful unlabeled, unpaired multimodal data.
arXiv Detail & Related papers (2020-07-02T15:08:11Z) - DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a
Trained Classifier [58.979104709647295]
We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network.
We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples.
We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
arXiv Detail & Related papers (2019-12-27T02:05:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.