Bivariate Causal Discovery for Categorical Data via Classification with
Optimal Label Permutation
- URL: http://arxiv.org/abs/2209.08579v1
- Date: Sun, 18 Sep 2022 15:04:55 GMT
- Title: Bivariate Causal Discovery for Categorical Data via Classification with
Optimal Label Permutation
- Authors: Yang Ni
- Abstract summary: We propose a novel causal model for categorical data based on a new classification model, termed classification with optimal label permutation (COLP)
A simple learning algorithm via comparing likelihood functions of causal and anti-causal models suffices to learn the causal direction.
- Score: 2.0305676256390934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Causal discovery for quantitative data has been extensively studied but less
is known for categorical data. We propose a novel causal model for categorical
data based on a new classification model, termed classification with optimal
label permutation (COLP). By design, COLP is a parsimonious classifier, which
gives rise to a provably identifiable causal model. A simple learning algorithm
via comparing likelihood functions of causal and anti-causal models suffices to
learn the causal direction. Through experiments with synthetic and real data,
we demonstrate the favorable performance of the proposed COLP-based causal
model compared to state-of-the-art methods. We also make available an
accompanying R package COLP, which contains the proposed causal discovery
algorithm and a benchmark dataset of categorical cause-effect pairs.
Related papers
- Sample, estimate, aggregate: A recipe for causal discovery foundation models [28.116832159265964]
We train a supervised model that learns to predict a larger causal graph from the outputs of classical causal discovery algorithms run over subsets of variables.
Our approach is enabled by the observation that typical errors in the outputs of classical methods remain comparable across datasets.
Experiments on real and synthetic data demonstrate that this model maintains high accuracy in the face of misspecification or distribution shift.
arXiv Detail & Related papers (2024-02-02T21:57:58Z) - Shortcuts for causal discovery of nonlinear models by score matching [32.01302470630594]
We define and characterize a score-sortability pattern of nonlinear additive noise models.
We show the score-sortability of the most common synthetic benchmarks in the literature.
Our findings remark the lack of diversity in the data as an important limitation in the evaluation of nonlinear causal discovery approaches.
arXiv Detail & Related papers (2023-10-22T10:09:52Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Less is More: Mitigate Spurious Correlations for Open-Domain Dialogue
Response Generation Models by Causal Discovery [52.95935278819512]
We conduct the first study on spurious correlations for open-domain response generation models based on a corpus CGDIALOG curated in our work.
Inspired by causal discovery algorithms, we propose a novel model-agnostic method for training and inference of response generation model.
arXiv Detail & Related papers (2023-03-02T06:33:48Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - Score matching enables causal discovery of nonlinear additive noise
models [63.93669924730725]
We show how to design a new generation of scalable causal discovery methods.
We propose a new efficient method for approximating the score's Jacobian, enabling to recover the causal graph.
arXiv Detail & Related papers (2022-03-08T21:34:46Z) - Ordinal Causal Discovery [2.0305676256390934]
This paper proposes an identifiable ordinal causal discovery method that exploits the ordinal information contained in many real-world applications to uniquely identify the causal structure.
We show that the proposed ordinal causal discovery method has favorable and robust performance compared to state-of-the-art alternative methods in both ordinal categorical and non-categorical data.
arXiv Detail & Related papers (2022-01-19T03:11:26Z) - Improving Efficiency and Accuracy of Causal Discovery Using a
Hierarchical Wrapper [7.570246812206772]
Causal discovery from observational data is an important tool in many branches of science.
In the large sample limit, sound and complete causal discovery algorithms have been previously introduced.
However, only finite training data is available, which limits the power of statistical tests used by these algorithms.
arXiv Detail & Related papers (2021-07-11T09:24:49Z) - Harmonization with Flow-based Causal Inference [12.739380441313022]
This paper presents a normalizing-flow-based method to perform counterfactual inference upon a structural causal model (SCM) to harmonize medical data.
We evaluate on multiple, large, real-world medical datasets to observe that this method leads to better cross-domain generalization compared to state-of-the-art algorithms.
arXiv Detail & Related papers (2021-06-12T19:57:35Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.