On the causality-preservation capabilities of generative modelling
- URL: http://arxiv.org/abs/2301.01109v1
- Date: Tue, 3 Jan 2023 14:09:15 GMT
- Title: On the causality-preservation capabilities of generative modelling
- Authors: Yves-C\'edric Bauwelinckx, Jan Dhaene, Tim Verdonck, Milan van den
Heuvel
- Abstract summary: We study the causal preservation capabilities of GANs and whether the produced synthetic data can reliably be used to answer causal questions.
This is done by performing causal analyses on the synthetic data, produced by a GAN, with increasingly more lenient assumptions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Modeling lies at the core of both the financial and the insurance industry
for a wide variety of tasks. The rise and development of machine learning and
deep learning models have created many opportunities to improve our modeling
toolbox. Breakthroughs in these fields often come with the requirement of large
amounts of data. Such large datasets are often not publicly available in
finance and insurance, mainly due to privacy and ethics concerns. This lack of
data is currently one of the main hurdles in developing better models. One
possible option to alleviating this issue is generative modeling. Generative
models are capable of simulating fake but realistic-looking data, also referred
to as synthetic data, that can be shared more freely. Generative Adversarial
Networks (GANs) is such a model that increases our capacity to fit very
high-dimensional distributions of data. While research on GANs is an active
topic in fields like computer vision, they have found limited adoption within
the human sciences, like economics and insurance. Reason for this is that in
these fields, most questions are inherently about identification of causal
effects, while to this day neural networks, which are at the center of the GAN
framework, focus mostly on high-dimensional correlations. In this paper we
study the causal preservation capabilities of GANs and whether the produced
synthetic data can reliably be used to answer causal questions. This is done by
performing causal analyses on the synthetic data, produced by a GAN, with
increasingly more lenient assumptions. We consider the cross-sectional case,
the time series case and the case with a complete structural model. It is shown
that in the simple cross-sectional scenario where correlation equals causation
the GAN preserves causality, but that challenges arise for more advanced
analyses.
Related papers
- Comprehensive Exploration of Synthetic Data Generation: A Survey [4.485401662312072]
This work surveys 417 Synthetic Data Generation models over the last decade.
The findings reveal increased model performance and complexity, with neural network-based approaches prevailing.
Computer vision dominates, with GANs as primary generative models, while diffusion models, transformers, and RNNs compete.
arXiv Detail & Related papers (2024-01-04T20:23:51Z) - Identifiable Latent Polynomial Causal Models Through the Lens of Change [82.14087963690561]
Causal representation learning aims to unveil latent high-level causal representations from observed low-level data.
One of its primary tasks is to provide reliable assurance of identifying these latent causal models, known as identifiability.
arXiv Detail & Related papers (2023-10-24T07:46:10Z) - From Identifiable Causal Representations to Controllable Counterfactual Generation: A Survey on Causal Generative Modeling [17.074858228123706]
We focus on fundamental theory, methodology, drawbacks, datasets, and metrics.
We cover applications of causal generative models in fairness, privacy, out-of-distribution generalization, precision medicine, and biological sciences.
arXiv Detail & Related papers (2023-10-17T05:45:32Z) - On the Stability of Iterative Retraining of Generative Models on their own Data [56.153542044045224]
We study the impact of training generative models on mixed datasets.
We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough.
We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-09-30T16:41:04Z) - CasTGAN: Cascaded Generative Adversarial Network for Realistic Tabular
Data Synthesis [0.4999814847776097]
Generative adversarial networks (GANs) have drawn considerable attention in recent years for their proven capability in generating synthetic data.
The validity of the synthetic data and the underlying privacy concerns represent major challenges which are not sufficiently addressed.
arXiv Detail & Related papers (2023-07-01T16:52:18Z) - De-Biasing Generative Models using Counterfactual Methods [0.0]
We propose a new decoder based framework named the Causal Counterfactual Generative Model (CCGM)
Our proposed method combines a causal latent space VAE model with specific modification to emphasize causal fidelity.
We explore how better disentanglement of causal learning and encoding/decoding generates higher causal intervention quality.
arXiv Detail & Related papers (2022-07-04T16:53:20Z) - Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers.
We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z) - On the Efficacy of Adversarial Data Collection for Question Answering:
Results from a Large-Scale Randomized Study [65.17429512679695]
In adversarial data collection (ADC), a human workforce interacts with a model in real time, attempting to produce examples that elicit incorrect predictions.
Despite ADC's intuitive appeal, it remains unclear when training on adversarial datasets produces more robust models.
arXiv Detail & Related papers (2021-06-02T00:48:33Z) - Causal Inference with Deep Causal Graphs [0.0]
Parametric causal modelling techniques rarely provide functionality for counterfactual estimation.
Deep Causal Graphs is an abstract specification of the required functionality for a neural network to model causal distributions.
We demonstrate its expressive power in modelling complex interactions and showcase applications to machine learning explainability and fairness.
arXiv Detail & Related papers (2020-06-15T13:03:33Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z) - CHEER: Rich Model Helps Poor Model via Knowledge Infusion [69.23072792708263]
We develop a knowledge infusion framework named CHEER that can succinctly summarize such rich model into transferable representations.
Our empirical results showed that CHEER outperformed baselines by 5.60% to 46.80% in terms of the macro-F1 score on multiple physiological datasets.
arXiv Detail & Related papers (2020-05-21T21:44:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.