DoubleMLDeep: Estimation of Causal Effects with Multimodal Data
- URL: http://arxiv.org/abs/2402.01785v1
- Date: Thu, 1 Feb 2024 21:34:34 GMT
- Title: DoubleMLDeep: Estimation of Causal Effects with Multimodal Data
- Authors: Sven Klaassen, Jan Teichert-Kluge, Philipp Bach, Victor Chernozhukov,
Martin Spindler, Suhas Vijaykumar
- Abstract summary: This paper explores the use of unstructured, multimodal data, namely text and images, in causal inference and treatment effect estimation.
We propose a neural network architecture that is adapted to the double machine learning (DML) framework, specifically the partially linear model.
An additional contribution of our paper is a new method to generate a semi-synthetic dataset which can be used to evaluate the performance of causal effect estimation.
- Score: 7.014959855847738
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper explores the use of unstructured, multimodal data, namely text and
images, in causal inference and treatment effect estimation. We propose a
neural network architecture that is adapted to the double machine learning
(DML) framework, specifically the partially linear model. An additional
contribution of our paper is a new method to generate a semi-synthetic dataset
which can be used to evaluate the performance of causal effect estimation in
the presence of text and images as confounders. The proposed methods and
architectures are evaluated on the semi-synthetic dataset and compared to
standard approaches, highlighting the potential benefit of using text and
images directly in causal studies. Our findings have implications for
researchers and practitioners in economics, marketing, finance, medicine and
data science in general who are interested in estimating causal quantities
using non-traditional data.
Related papers
- Causal Representation Learning with Generative Artificial Intelligence: Application to Texts as Treatments [0.0]
We show how to enhance the validity of causal inference with unstructured high-dimensional treatments like texts.
We propose to use a deep generative model such as large language models (LLMs) to efficiently generate treatments.
We show that the knowledge of this true internal representation helps disentangle the treatment features of interest.
arXiv Detail & Related papers (2024-10-01T17:46:21Z) - From Text to Treatment Effects: A Meta-Learning Approach to Handling Text-Based Confounding [7.5348062792]
This paper examines the performance of meta-learners when confounding variables are expressed in text.
We show that learners using pre-trained text representations of confounders achieve improved CATE estimates.
Due to the entangled nature of the text embeddings, these models do not fully match the performance of meta-learners with perfect confounder knowledge.
arXiv Detail & Related papers (2024-09-23T19:46:19Z) - Images in Discrete Choice Modeling: Addressing Data Isomorphism in
Multi-Modality Inputs [77.54052164713394]
This paper explores the intersection of Discrete Choice Modeling (DCM) and machine learning.
We investigate the consequences of embedding high-dimensional image data that shares isomorphic information with traditional tabular inputs within a DCM framework.
arXiv Detail & Related papers (2023-12-22T14:33:54Z) - Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis [89.04041100520881]
This research proposes to retrieve textual and visual evidence based on the object, sentence, and whole image.
We develop a novel approach to synthesize the object-level, image-level, and sentence-level information for better reasoning between the same and different modalities.
arXiv Detail & Related papers (2023-05-25T15:26:13Z) - CARLA-GeAR: a Dataset Generator for a Systematic Evaluation of
Adversarial Robustness of Vision Models [61.68061613161187]
This paper presents CARLA-GeAR, a tool for the automatic generation of synthetic datasets for evaluating the robustness of neural models against physical adversarial patches.
The tool is built on the CARLA simulator, using its Python API, and allows the generation of datasets for several vision tasks in the context of autonomous driving.
The paper presents an experimental study to evaluate the performance of some defense methods against such attacks, showing how the datasets generated with CARLA-GeAR might be used in future work as a benchmark for adversarial defense in the real world.
arXiv Detail & Related papers (2022-06-09T09:17:38Z) - Is More Data All You Need? A Causal Exploration [4.756600446882457]
Causal analysis is often used in medicine and economics to gain insights about the effects of actions and policies.
In this paper we explore the effect of dataset interventions on the output of image classification models.
arXiv Detail & Related papers (2022-06-06T08:02:54Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning.
It aims to extract both the common information and the complementary information in an adversarial setting.
In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z) - Generating Synthetic Text Data to Evaluate Causal Inference Methods [23.330942019150786]
We develop a framework for adapting existing generation models to produce synthetic text datasets with known causal effects.
We use this framework to perform an empirical comparison of four recently-proposed methods for estimating causal effects from text data.
arXiv Detail & Related papers (2021-02-10T18:53:11Z) - Semi-Structured Deep Piecewise Exponential Models [2.7728956081909346]
We propose a versatile framework for survival analysis that combines advanced concepts from statistics with deep learning.
A proof of concept is provided by using the framework to predict Alzheimer's disease progression.
arXiv Detail & Related papers (2020-11-11T14:41:19Z) - CDEvalSumm: An Empirical Study of Cross-Dataset Evaluation for Neural
Summarization Systems [121.78477833009671]
We investigate the performance of different summarization models under a cross-dataset setting.
A comprehensive study of 11 representative summarization systems on 5 datasets from different domains reveals the effect of model architectures and generation ways.
arXiv Detail & Related papers (2020-10-11T02:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.