DAGnosis: Localized Identification of Data Inconsistencies using
Structures
- URL: http://arxiv.org/abs/2402.17599v2
- Date: Wed, 28 Feb 2024 10:46:07 GMT
- Title: DAGnosis: Localized Identification of Data Inconsistencies using
Structures
- Authors: Nicolas Huynh, Jeroen Berrevoets, Nabeel Seedat, Jonathan Crabb\'e,
Zhaozhi Qian, Mihaela van der Schaar
- Abstract summary: Identification and appropriate handling of inconsistencies in data at deployment time is crucial to reliably use machine learning models.
We use directed acyclic graphs (DAGs) to encode the training set's features probability distribution and independencies as a structure.
Our method, called DAGnosis, leverages these structural interactions to bring valuable and insightful data-centric conclusions.
- Score: 73.39285449012255
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Identification and appropriate handling of inconsistencies in data at
deployment time is crucial to reliably use machine learning models. While
recent data-centric methods are able to identify such inconsistencies with
respect to the training set, they suffer from two key limitations: (1)
suboptimality in settings where features exhibit statistical independencies,
due to their usage of compressive representations and (2) lack of localization
to pin-point why a sample might be flagged as inconsistent, which is important
to guide future data collection. We solve these two fundamental limitations
using directed acyclic graphs (DAGs) to encode the training set's features
probability distribution and independencies as a structure. Our method, called
DAGnosis, leverages these structural interactions to bring valuable and
insightful data-centric conclusions. DAGnosis unlocks the localization of the
causes of inconsistencies on a DAG, an aspect overlooked by previous
approaches. Moreover, we show empirically that leveraging these interactions
(1) leads to more accurate conclusions in detecting inconsistencies, as well as
(2) provides more detailed insights into why some samples are flagged.
Related papers
- Federated Causal Discovery from Heterogeneous Data [70.31070224690399]
We propose a novel FCD method attempting to accommodate arbitrary causal models and heterogeneous data.
These approaches involve constructing summary statistics as a proxy of the raw data to protect data privacy.
We conduct extensive experiments on synthetic and real datasets to show the efficacy of our method.
arXiv Detail & Related papers (2024-02-20T18:53:53Z) - General Identifiability and Achievability for Causal Representation
Learning [33.80247458590611]
The paper establishes identifiability and achievability results using two hard uncoupled interventions per node in the latent causal graph.
For identifiability, the paper establishes that perfect recovery of the latent causal model and variables is guaranteed under uncoupled interventions.
The analysis, additionally, recovers the identifiability result for two hard coupled interventions, that is when metadata about the pair of environments that have the same node intervened is known.
arXiv Detail & Related papers (2023-10-24T01:47:44Z) - Conditional Feature Importance for Mixed Data [1.6114012813668934]
We develop a conditional predictive impact (CPI) framework with knockoff sampling.
We show that our proposed workflow controls type I error, achieves high power and is in line with results given by other conditional FI measures.
Our findings highlight the necessity of developing statistically adequate, specialized methods for mixed data.
arXiv Detail & Related papers (2022-10-06T16:52:38Z) - Context-Aware Drift Detection [0.0]
Two-sample tests of homogeneity form the foundation upon which existing approaches to drift detection build.
We develop a more general drift detection framework built upon a foundation of two-sample tests for conditional distributional treatment effects.
arXiv Detail & Related papers (2022-03-16T14:23:02Z) - Data-SUITE: Data-centric identification of in-distribution incongruous
examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data.
We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z) - Federated Causal Discovery [74.37739054932733]
This paper develops a gradient-based learning framework named DAG-Shared Federated Causal Discovery (DS-FCD)
It can learn the causal graph without directly touching local data and naturally handle the data heterogeneity.
Extensive experiments on both synthetic and real-world datasets verify the efficacy of the proposed method.
arXiv Detail & Related papers (2021-12-07T08:04:12Z) - BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery [97.79015388276483]
A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG)
Recent advances enabled effective maximum-likelihood point estimation of DAGs from observational data.
We propose BCD Nets, a variational framework for estimating a distribution over DAGs characterizing a linear-Gaussian SEM.
arXiv Detail & Related papers (2021-12-06T03:35:21Z) - On Disentangled Representations Learned From Correlated Data [59.41587388303554]
We bridge the gap to real-world scenarios by analyzing the behavior of the most prominent disentanglement approaches on correlated data.
We show that systematically induced correlations in the dataset are being learned and reflected in the latent representations.
We also demonstrate how to resolve these latent correlations, either using weak supervision during training or by post-hoc correcting a pre-trained model with a small number of labels.
arXiv Detail & Related papers (2020-06-14T12:47:34Z) - MissDeepCausal: Causal Inference from Incomplete Data Using Deep Latent
Variable Models [14.173184309520453]
State-of-the-art methods for causal inference don't consider missing values.
Missing data require an adapted unconfoundedness hypothesis.
Latent confounders whose distribution is learned through variational autoencoders adapted to missing values are considered.
arXiv Detail & Related papers (2020-02-25T12:58:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.