Hierarchical Variable Importance with Statistical Control for Medical Data-Based Prediction
- URL: http://arxiv.org/abs/2508.08724v1
- Date: Tue, 12 Aug 2025 08:10:54 GMT
- Title: Hierarchical Variable Importance with Statistical Control for Medical Data-Based Prediction
- Authors: Joseph Paillard, Antoine Collas, Denis A. Engemann, Bertrand Thirion,
- Abstract summary: We introduce Hierarchical-CPI, a model-agnostic variable importance measure that frames the inference problem as the discovery of groups of variables.<n>By exploring subgroups along a hierarchical tree, it remains computationally tractable, yet also enjoys explicit family-wise error rate control.<n>Its effectiveness is demonstrated in two neuroimaging datasets.
- Score: 35.94354098982828
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in machine learning have greatly expanded the repertoire of predictive methods for medical imaging. However, the interpretability of complex models remains a challenge, which limits their utility in medical applications. Recently, model-agnostic methods have been proposed to measure conditional variable importance and accommodate complex non-linear models. However, they often lack power when dealing with highly correlated data, a common problem in medical imaging. We introduce Hierarchical-CPI, a model-agnostic variable importance measure that frames the inference problem as the discovery of groups of variables that are jointly predictive of the outcome. By exploring subgroups along a hierarchical tree, it remains computationally tractable, yet also enjoys explicit family-wise error rate control. Moreover, we address the issue of vanishing conditional importance under high correlation with a tree-based importance allocation mechanism. We benchmarked Hierarchical-CPI against state-of-the-art variable importance methods. Its effectiveness is demonstrated in two neuroimaging datasets: classifying dementia diagnoses from MRI data (ADNI dataset) and analyzing the Berger effect on EEG data (TDBRAIN dataset), identifying biologically plausible variables.
Related papers
- CALM: A Causal Analysis Language Model for Tabular Data in Complex Systems with Local Scores, Conditional Independence Tests, and Relation Attributes [15.298086464296235]
Causal discovery from observational data is fundamental to scientific fields like biology.<n>Existing methods, including constraint-based and score-based approaches, face significant limitations.<n>We introduce CALM, a novel causal analysis language model specifically designed for tabular data.
arXiv Detail & Related papers (2025-10-10T20:19:20Z) - Identifying biological perturbation targets through causal differential networks [23.568795598997376]
We propose a causality-inspired approach to identify variables responsible for changes to a biological system.<n>First, we infer noisy causal graphs from the observational and interventional data.<n>We then learn to map the differences between these graphs, along with additional statistical features, to sets of variables that were intervened upon.
arXiv Detail & Related papers (2024-10-04T12:48:21Z) - Large-Scale Targeted Cause Discovery with Data-Driven Learning [66.86881771339145]
We propose a novel machine learning approach for inferring causal variables of a target variable from observations.<n>By employing a local-inference strategy, our approach scales with linear complexity in the number of variables, efficiently scaling up to thousands of variables.<n> Empirical results demonstrate superior performance in identifying causal relationships within large-scale gene regulatory networks.
arXiv Detail & Related papers (2024-08-29T02:21:11Z) - Texture Feature Analysis for Classification of Early-Stage Prostate Cancer in mpMRI [0.0]
We analyze the contributions made by first-order statistical features, Haralick texture features, and local binary patterns to the classification.
We identify a small set of features that determine the classification outcome, which may aid the development of explainable AI approaches.
arXiv Detail & Related papers (2024-06-21T18:12:58Z) - A Demographic-Conditioned Variational Autoencoder for fMRI Distribution Sampling and Removal of Confounds [49.34500499203579]
We create a variational autoencoder (VAE)-based model, DemoVAE, to decorrelate fMRI features from demographics.
We generate high-quality synthetic fMRI data based on user-supplied demographics.
arXiv Detail & Related papers (2024-05-13T17:49:20Z) - Unmasking Dementia Detection by Masking Input Gradients: A JSM Approach
to Model Interpretability and Precision [1.5501208213584152]
We introduce an interpretable, multimodal model for Alzheimer's disease (AD) classification over its multi-stage progression, incorporating Jacobian Saliency Map (JSM) as a modality-agnostic tool.
Our evaluation including ablation study manifests the efficacy of using JSM for model debug and interpretation, while significantly enhancing model accuracy as well.
arXiv Detail & Related papers (2024-02-25T06:53:35Z) - Variable Importance in High-Dimensional Settings Requires Grouping [19.095605415846187]
Conditional Permutation Importance (CPI) bypasses PI's limitations in such cases.
Grouping variables statistically via clustering or some prior knowledge gains some power back.
We show that the approach extended with stacking controls the type-I error even with highly-correlated groups.
arXiv Detail & Related papers (2023-12-18T00:21:47Z) - A Causal Framework for Decomposing Spurious Variations [68.12191782657437]
We develop tools for decomposing spurious variations in Markovian and Semi-Markovian models.
We prove the first results that allow a non-parametric decomposition of spurious effects.
The described approach has several applications, ranging from explainable and fair AI to questions in epidemiology and medicine.
arXiv Detail & Related papers (2023-06-08T09:40:28Z) - Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation.
GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z) - Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process.
Our method significantly reduces the required number of interactions compared with random intervention targeting.
We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z) - Handling Non-ignorably Missing Features in Electronic Health Records
Data Using Importance-Weighted Autoencoders [8.518166245293703]
We propose a novel extension of VAEs called Importance-Weighted Autoencoders (IWAEs) to flexibly handle Missing Not At Random patterns in the Physionet data.
Our proposed method models the missingness mechanism using an embedded neural network, eliminating the need to specify the exact form of the missingness mechanism a priori.
arXiv Detail & Related papers (2021-01-18T22:53:29Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.