Towards Causal Relationship in Indefinite Data: Baseline Model and New
Datasets
- URL: http://arxiv.org/abs/2401.08221v1
- Date: Tue, 16 Jan 2024 09:15:43 GMT
- Title: Towards Causal Relationship in Indefinite Data: Baseline Model and New
Datasets
- Authors: Hang Chen, Xinyu Yang, Keqing Du
- Abstract summary: "Indefinite Data" is characterized by multi-structure data and multi-value representations.
We release two high-quality datasets - Causalogue and Causaction.
We propose a probabilistic framework as a baseline, incorporating three designed highlights for this gap.
- Score: 23.035761299444953
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Integrating deep learning and causal discovery has encouraged us to spot that
learning causal structures and representations in dialogue and video is full of
challenges. We defined These data forms as "Indefinite Data", characterized by
multi-structure data and multi-value representations. Unlike existing adaptable
data forms, Indefinite Data still faces gaps in datasets and methods. To
address the dataset gap, we release two high-quality datasets - Causalogue and
Causaction, containing text dialogue samples and video action samples with
causal annotations respectively. Moreover, the method gap arises from the
coexistence of multi-structure data and multi-value representations, breaking
the assumptions of all current methods and rendering them infeasible on
Indefinite Data. To this end, we propose a probabilistic framework as a
baseline, incorporating three designed highlights for this gap: 1) establishing
Causation Condition of representations using the independence of noise terms
under non-fixed causal structures, 2) treating causal strength as a latent
variable and measuring the reconstruction loss in the correlation space, and 3)
estimating the effects of latent confounders. These highpoints make the
probabilistic model capable of overcoming challenges brought by the coexistence
of multi-structure data and multi-value representations and pave the way for
the extension of latent confounders. Comprehensive experiments have evaluated
baseline results of causal structures, causal representations, and confounding
disentanglement.
Related papers
- DAGnosis: Localized Identification of Data Inconsistencies using
Structures [73.39285449012255]
Identification and appropriate handling of inconsistencies in data at deployment time is crucial to reliably use machine learning models.
We use directed acyclic graphs (DAGs) to encode the training set's features probability distribution and independencies as a structure.
Our method, called DAGnosis, leverages these structural interactions to bring valuable and insightful data-centric conclusions.
arXiv Detail & Related papers (2024-02-26T11:29:16Z) - A Review and Roadmap of Deep Causal Model from Different Causal
Structures and Representations [23.87336875544181]
We redefinition causal data into three categories: definite data, semi-definite data, and indefinite data.
Definite data pertains to statistical data used in conventional causal scenarios, while semi-definite data refers to a spectrum of data formats germane to deep learning.
Indefinite data is an emergent research sphere inferred from the progression of data forms by us.
arXiv Detail & Related papers (2023-11-02T01:31:42Z) - SSL Framework for Causal Inconsistency between Structures and
Representations [23.035761299444953]
Cross-pollination of deep learning and causal discovery has catalyzed a burgeoning field of research seeking to elucidate causal relationships within non-statistical data forms like images, videos, and text.
We theoretically develop intervention strategies suitable for indefinite data and derive causal consistency condition (CCC)
CCC could potentially play an influential role in various fields.
arXiv Detail & Related papers (2023-10-28T08:29:49Z) - Identifiable Latent Polynomial Causal Models Through the Lens of Change [82.14087963690561]
Causal representation learning aims to unveil latent high-level causal representations from observed low-level data.
One of its primary tasks is to provide reliable assurance of identifying these latent causal models, known as identifiability.
arXiv Detail & Related papers (2023-10-24T07:46:10Z) - Inducing Causal Structure for Abstractive Text Summarization [76.1000380429553]
We introduce a Structural Causal Model (SCM) to induce the underlying causal structure of the summarization data.
We propose a Causality Inspired Sequence-to-Sequence model (CI-Seq2Seq) to learn the causal representations that can mimic the causal factors.
Experimental results on two widely used text summarization datasets demonstrate the advantages of our approach.
arXiv Detail & Related papers (2023-08-24T16:06:36Z) - Towards Causal Representation Learning and Deconfounding from Indefinite
Data [17.793702165499298]
Non-statistical data (e.g., images, text, etc.) encounters significant conflicts in terms of properties and methods with traditional causal data.
We redefine causal data from two novel perspectives and then propose three data paradigms.
We implement the above designs as a dynamic variational inference model, tailored to learn causal representation from indefinite data.
arXiv Detail & Related papers (2023-05-04T08:20:37Z) - DOT-VAE: Disentangling One Factor at a Time [1.6114012813668934]
We propose a novel framework which augments the latent space of a Variational Autoencoders with a disentangled space and is trained using a Wake-Sleep-inspired two-step algorithm for unsupervised disentanglement.
Our network learns to disentangle interpretable, independent factors from the data one at a time", and encode it in different dimensions of the disentangled latent space, while making no prior assumptions about the number of factors or their joint distribution.
arXiv Detail & Related papers (2022-10-19T22:53:02Z) - Exploring the Trade-off between Plausibility, Change Intensity and
Adversarial Power in Counterfactual Explanations using Multi-objective
Optimization [73.89239820192894]
We argue that automated counterfactual generation should regard several aspects of the produced adversarial instances.
We present a novel framework for the generation of counterfactual examples.
arXiv Detail & Related papers (2022-05-20T15:02:53Z) - Uncovering Main Causalities for Long-tailed Information Extraction [14.39860866665021]
Long-tailed distributions caused by the selection bias of a dataset may lead to incorrect correlations.
This motivates us to propose counterfactual IE (CFIE), a novel framework that aims to uncover the main causalities behind data.
arXiv Detail & Related papers (2021-09-11T08:08:24Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z) - A Critical View of the Structural Causal Model [89.43277111586258]
We show that one can identify the cause and the effect without considering their interaction at all.
We propose a new adversarial training method that mimics the disentangled structure of the causal model.
Our multidimensional method outperforms the literature methods on both synthetic and real world datasets.
arXiv Detail & Related papers (2020-02-23T22:52:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.