Related papers: Aggregate Models, Not Explanations: Improving Feature Importance Estimation

Aggregate Models, Not Explanations: Improving Feature Importance Estimation

URL: http://arxiv.org/abs/2602.11760v1
Date: Thu, 12 Feb 2026 09:36:03 GMT
Title: Aggregate Models, Not Explanations: Improving Feature Importance Estimation
Authors: Joseph Paillard, Angel Reyero Lobo, Denis A. Engemann, Bertrand Thirion,
Abstract summary: We show that ensembling at the model level provides more accurate variable-importance estimates.<n>We validate these findings on classical benchmarks and a large-scale proteomic study from the UK Biobank.
Score: 29.82699646128964
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Feature-importance methods show promise in transforming machine learning models from predictive engines into tools for scientific discovery. However, due to data sampling and algorithmic stochasticity, expressive models can be unstable, leading to inaccurate variable importance estimates and undermining their utility in critical biomedical applications. Although ensembling offers a solution, deciding whether to explain a single ensemble model or aggregate individual model explanations is difficult due to the nonlinearity of importance measures and remains largely understudied. Our theoretical analysis, developed under assumptions accommodating complex state-of-the-art ML models, reveals that this choice is primarily driven by the model's excess risk. In contrast to prior literature, we show that ensembling at the model level provides more accurate variable-importance estimates, particularly for expressive models, by reducing this leading error term. We validate these findings on classical benchmarks and a large-scale proteomic study from the UK Biobank.

Related papers

Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z)
PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis [8.785345412061792]
We introduce a comprehensive framework for modeling single cell transcriptomic responses to perturbations.<n>Our approach includes a modular and user-friendly model development and evaluation platform.<n>We highlight the limitations of widely used models, such as mode collapse.
arXiv Detail & Related papers (2024-08-20T07:40:20Z)
Exploration of the Rashomon Set Assists Trustworthy Explanations for Medical Data [4.499833362998488]
This paper introduces a novel process to explore models in the Rashomon set, extending the conventional modeling approach. We propose the $textttRashomon_DETECT$ algorithm to detect models with different behavior. To quantify differences in variable effects among models, we introduce the Profile Disparity Index (PDI) based on measures from functional data analysis.
arXiv Detail & Related papers (2023-08-22T13:53:43Z)
ER: Equivariance Regularizer for Knowledge Graph Completion [107.51609402963072]
We propose a new regularizer, namely, Equivariance Regularizer (ER) ER can enhance the generalization ability of the model by employing the semantic equivariance between the head and tail entities. The experimental results indicate a clear and substantial improvement over the state-of-the-art relation prediction methods.
arXiv Detail & Related papers (2022-06-24T08:18:05Z)
On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules. We study the generalization and adaption performance of such modular neural causal models. Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z)
GAM(e) changer or not? An evaluation of interpretable machine learning models based on additive model constraints [5.783415024516947]
This paper investigates a series of intrinsically interpretable machine learning models. We evaluate the prediction qualities of five GAMs as compared to six traditional ML models.
arXiv Detail & Related papers (2022-04-19T20:37:31Z)
Model Uncertainty and Correctability for Directed Graphical Models [3.326320568999945]
We develop information-theoretic, robust uncertainty quantification methods and non-parametric stress tests for directed graphical models. We provide a mathematically rigorous approach to correctability that guarantees a systematic selection for improvement of components of a graphical model. We demonstrate our methods in two physico-chemical examples, namely quantum scale-informed chemical kinetics and materials screening to improve the efficiency of fuel cells.
arXiv Detail & Related papers (2021-07-17T04:30:37Z)
Model-agnostic multi-objective approach for the evolutionary discovery of mathematical models [55.41644538483948]
In modern data science, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results. We use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties.
arXiv Detail & Related papers (2021-07-07T11:17:09Z)
A multi-stage machine learning model on diagnosis of esophageal manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage. This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
Comparing hundreds of machine learning classifiers and discrete choice models in predicting travel behavior: an empirical benchmark [6.815730801645785]
Many studies have compared machine learning (ML) and discrete choice models (DCMs) in predicting travel demand.<n>These studies often lack generalizability as they compare models deterministically without considering contextual variations.<n>This benchmark study compares two large-scale data sources.
arXiv Detail & Related papers (2021-02-01T19:45:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.