Covered Information Disentanglement: Model Transparency via Unbiased
Permutation Importance
- URL: http://arxiv.org/abs/2111.09744v2
- Date: Sun, 21 Nov 2021 21:22:35 GMT
- Title: Covered Information Disentanglement: Model Transparency via Unbiased
Permutation Importance
- Authors: Jo\~ao Pereira and Erik S.G. Stroes and Aeilko H. Zwinderman and
Evgeni Levin
- Abstract summary: We show how to compute Covered Information Disentanglement (CID) efficiently when coupled with Markov random fields.
We demonstrate its efficacy in adjusting permutation importance first on a controlled toy dataset and discuss its effect on real-world medical data.
- Score: 2.064612766965483
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Model transparency is a prerequisite in many domains and an increasingly
popular area in machine learning research. In the medical domain, for instance,
unveiling the mechanisms behind a disease often has higher priority than the
diagnostic itself since it might dictate or guide potential treatments and
research directions. One of the most popular approaches to explain model global
predictions is the permutation importance where the performance on permuted
data is benchmarked against the baseline. However, this method and other
related approaches will undervalue the importance of a feature in the presence
of covariates since these cover part of its provided information. To address
this issue, we propose Covered Information Disentanglement (CID), a method that
considers all feature information overlap to correct the values provided by
permutation importance. We further show how to compute CID efficiently when
coupled with Markov random fields. We demonstrate its efficacy in adjusting
permutation importance first on a controlled toy dataset and discuss its effect
on real-world medical data.
Related papers
- Cross-Dataset Generalization For Retinal Lesions Segmentation [2.1160877779256126]
This study characterizes several known datasets and compares different techniques that have been proposed to enhance the generalisation performance of a model.
Our results provide insights into how to combine coarsely labelled data with a finely-grained dataset in order to improve the lesions segmentation.
arXiv Detail & Related papers (2024-05-14T05:52:01Z) - Prospector Heads: Generalized Feature Attribution for Large Models & Data [82.02696069543454]
We introduce prospector heads, an efficient and interpretable alternative to explanation-based attribution methods.
We demonstrate how prospector heads enable improved interpretation and discovery of class-specific patterns in input data.
arXiv Detail & Related papers (2024-02-18T23:01:28Z) - Domain-invariant Clinical Representation Learning by Bridging Data
Distribution Shift across EMR Datasets [16.317118701435742]
An effective prognostic model is expected to assist doctors in making right diagnosis and designing personalized treatment plan.
In the early stage of a disease, limited data collection and clinical experiences, plus the concern out of privacy and ethics, may result in restricted data availability for reference.
This article introduces a domain-invariant representation learning method to build a transition model from source dataset to target dataset.
arXiv Detail & Related papers (2023-10-11T18:32:21Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - ArSDM: Colonoscopy Images Synthesis with Adaptive Refinement Semantic
Diffusion Models [69.9178140563928]
Colonoscopy analysis is essential for assisting clinical diagnosis and treatment.
The scarcity of annotated data limits the effectiveness and generalization of existing methods.
We propose an Adaptive Refinement Semantic Diffusion Model (ArSDM) to generate colonoscopy images that benefit the downstream tasks.
arXiv Detail & Related papers (2023-09-03T07:55:46Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - Drug Discovery under Covariate Shift with Domain-Informed Prior
Distributions over Functions [30.305418761024143]
Real-world drug discovery tasks are often characterized by a scarcity of labeled data and a significant range of data.
We present a principled way to encode explicit prior knowledge of the data-generating process into a prior distribution.
We demonstrate that using integrate Q-SAVI to contextualize prior knowledgelike chemical space into the modeling process affords substantial accuracy and calibration.
arXiv Detail & Related papers (2023-07-14T05:01:10Z) - Practical Challenges in Differentially-Private Federated Survival
Analysis of Medical Data [57.19441629270029]
In this paper, we take advantage of the inherent properties of neural networks to federate the process of training of survival analysis models.
In the realistic setting of small medical datasets and only a few data centers, this noise makes it harder for the models to converge.
We propose DPFed-post which adds a post-processing stage to the private federated learning scheme.
arXiv Detail & Related papers (2022-02-08T10:03:24Z) - Encoding Domain Information with Sparse Priors for Inferring Explainable
Latent Variables [2.8935588665357077]
We propose spex-LVM, a factorial latent variable model with sparse priors to encourage the inference of explainable factors.
spex-LVM utilizes existing knowledge of curated biomedical pathways to automatically assign annotated attributes to latent factors.
Evaluations on simulated and real single-cell RNA-seq datasets demonstrate that our model robustly identifies relevant structure in an inherently explainable manner.
arXiv Detail & Related papers (2021-07-08T10:19:32Z) - Explaining COVID-19 and Thoracic Pathology Model Predictions by
Identifying Informative Input Features [47.45835732009979]
Neural networks have demonstrated remarkable performance in classification and regression tasks on chest X-rays.
Features attribution methods identify the importance of input features for the output prediction.
We evaluate our methods using both human-centric (ground-truth-based) interpretability metrics, and human-independent feature importance metrics on NIH Chest X-ray8 and BrixIA datasets.
arXiv Detail & Related papers (2021-04-01T11:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.