Related papers: An empirical study of the effect of background data size on the stability of SHapley Additive exPlanations (SHAP) for deep learning models

An empirical study of the effect of background data size on the stability of SHapley Additive exPlanations (SHAP) for deep learning models

URL: http://arxiv.org/abs/2204.11351v3
Date: Sun, 9 Apr 2023 05:48:14 GMT
Title: An empirical study of the effect of background data size on the stability of SHapley Additive exPlanations (SHAP) for deep learning models
Authors: Han Yuan, Mingxuan Liu, Lican Kang, Chenkui Miao, Ying Wu
Abstract summary: We show that SHAP values and variable rankings fluctuate when using different background datasets acquired from random sampling. Our results suggest that users should take into account how background data affects SHAP results, with improved SHAP stability as the background sample size increases.
Score: 14.65535880059975
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Nowadays, the interpretation of why a machine learning (ML) model makes certain inferences is as crucial as the accuracy of such inferences. Some ML models like the decision tree possess inherent interpretability that can be directly comprehended by humans. Others like artificial neural networks (ANN), however, rely on external methods to uncover the deduction mechanism. SHapley Additive exPlanations (SHAP) is one of such external methods, which requires a background dataset when interpreting ANNs. Generally, a background dataset consists of instances randomly sampled from the training dataset. However, the sampling size and its effect on SHAP remain to be unexplored. In our empirical study on the MIMIC-III dataset, we show that the two core explanations - SHAP values and variable rankings fluctuate when using different background datasets acquired from random sampling, indicating that users cannot unquestioningly trust the one-shot interpretation from SHAP. Luckily, such fluctuation decreases with the increase of the background dataset size. Also, we notice an U-shape in the stability assessment of SHAP variable rankings, demonstrating that SHAP is more reliable in ranking the most and least important variables compared to moderately important ones. Overall, our results suggest that users should take into account how background data affects SHAP results, with improved SHAP stability as the background sample size increases.

Related papers

Attributing Data for Sharpness-Aware Minimization [4.924675851574611]
Sharpness-aware Minimization (SAM) improves generalization in large-scale model training by linking loss geometry to generalization.<n>However, challenges such as mislabeled noisy data and privacy concerns have emerged as significant issues.<n>We develop two innovative data valuation methods for SAM, each offering unique benefits in different scenarios.
arXiv Detail & Related papers (2025-07-05T14:46:42Z)
In Shift and In Variance: Assessing the Robustness of HAR Deep Learning Models against Variability [4.330123738563178]
Human Activity Recognition (HAR) using wearable inertial measurement unit (IMU) sensors can revolutionize healthcare by enabling continual health monitoring, disease prediction, and routine recognition. Despite the high accuracy of Deep Learning (DL) HAR models, their robustness to real-world variabilities remains untested. We isolate subject, device, position, and orientation variability to determine their effect on DL HAR models and assess the robustness of these models in real-world conditions.
arXiv Detail & Related papers (2025-03-14T14:53:56Z)
Federated Learning with Sample-level Client Drift Mitigation [15.248811557566128]
Federated Learning suffers from severe performance degradation due to data heterogeneity among clients. We propose FedBSS that first mitigates the heterogeneity issue in a sample-level manner. We also achieved effective results on feature distribution and noise label dataset setting.
arXiv Detail & Related papers (2025-01-20T09:44:07Z)
A recursive Bayesian neural network for constitutive modeling of sands under monotonic loading [0.0]
In geotechnical engineering, models play a crucial role in describing soil behavior under varying loading conditions. Data-driven deep learning (DL) models offer a promising alternative for developing predictive models. When prediction is the primary focus, quantifying the predictive uncertainty of a trained DL model is crucial for informed decision-making.
arXiv Detail & Related papers (2025-01-17T10:15:03Z)
Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data [39.40116554523575]
We present Drift-Resilient TabPFN, a fresh approach based on In-Context Learning with a Prior-Data Fitted Network. It learns to approximate Bayesian inference on synthetic datasets drawn from a prior. It improves accuracy from 0.688 to 0.744 and ROC AUC from 0.786 to 0.832 while maintaining stronger calibration.
arXiv Detail & Related papers (2024-11-15T23:49:23Z)
Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance. DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator. Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z)
A Sparsity Principle for Partially Observable Causal Representation Learning [28.25303444099773]
Causal representation learning aims at identifying high-level causal variables from perceptual data. We focus on learning from unpaired observations from a dataset with an instance-dependent partial observability pattern. We propose two methods for estimating the underlying causal variables by enforcing sparsity in the inferred representation.
arXiv Detail & Related papers (2024-03-13T08:40:49Z)
Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets. We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z)
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs. We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting. Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z)
Discovering and Explaining the Non-Causality of Deep Learning in SAR ATR [20.662652637190515]
Deep learning has been widely used in SAR ATR and achieved excellent performance on the MSTAR dataset. In this paper, we quantify the contributions of different regions to target recognition based on the Shapley value. We explain how data bias and model bias contribute to non-causality.
arXiv Detail & Related papers (2023-04-03T00:45:11Z)
Scale-Equivalent Distillation for Semi-Supervised Object Detection [57.59525453301374]
Recent Semi-Supervised Object Detection (SS-OD) methods are mainly based on self-training, generating hard pseudo-labels by a teacher model on unlabeled data as supervisory signals. We analyze the challenges these methods meet with the empirical experiment results. We introduce a novel approach, Scale-Equivalent Distillation (SED), which is a simple yet effective end-to-end knowledge distillation framework robust to large object size variance and class imbalance.
arXiv Detail & Related papers (2022-03-23T07:33:37Z)
Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference. We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space. Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z)
Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning. Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch. ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z)
Meta Learning for Causal Direction [29.00522306460408]
We introduce a novel generative model that allows distinguishing cause and effect in the small data setting. We demonstrate our method on various synthetic as well as real-world data and show that it is able to maintain high accuracy in detecting directions across varying dataset sizes.
arXiv Detail & Related papers (2020-07-06T15:12:05Z)
Influence Functions in Deep Learning Are Fragile [52.31375893260445]
influence functions approximate the effect of samples in test-time predictions. influence estimates are fairly accurate for shallow networks. Hessian regularization is important to get highquality influence estimates.
arXiv Detail & Related papers (2020-06-25T18:25:59Z)
Differentially Private ERM Based on Data Perturbation [41.37436071802578]
We measure the contributions of various training data instances on the final machine learning model. Considering that the key of our method is to measure each data instance separately, we propose a new Data perturbation' based (DB) paradigm for DP-ERM.
arXiv Detail & Related papers (2020-02-20T06:05:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.