An empirical study of the effect of background data size on the
stability of SHapley Additive exPlanations (SHAP) for deep learning models
- URL: http://arxiv.org/abs/2204.11351v3
- Date: Sun, 9 Apr 2023 05:48:14 GMT
- Title: An empirical study of the effect of background data size on the
stability of SHapley Additive exPlanations (SHAP) for deep learning models
- Authors: Han Yuan, Mingxuan Liu, Lican Kang, Chenkui Miao, Ying Wu
- Abstract summary: We show that SHAP values and variable rankings fluctuate when using different background datasets acquired from random sampling.
Our results suggest that users should take into account how background data affects SHAP results, with improved SHAP stability as the background sample size increases.
- Score: 14.65535880059975
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Nowadays, the interpretation of why a machine learning (ML) model makes
certain inferences is as crucial as the accuracy of such inferences. Some ML
models like the decision tree possess inherent interpretability that can be
directly comprehended by humans. Others like artificial neural networks (ANN),
however, rely on external methods to uncover the deduction mechanism. SHapley
Additive exPlanations (SHAP) is one of such external methods, which requires a
background dataset when interpreting ANNs. Generally, a background dataset
consists of instances randomly sampled from the training dataset. However, the
sampling size and its effect on SHAP remain to be unexplored. In our empirical
study on the MIMIC-III dataset, we show that the two core explanations - SHAP
values and variable rankings fluctuate when using different background datasets
acquired from random sampling, indicating that users cannot unquestioningly
trust the one-shot interpretation from SHAP. Luckily, such fluctuation
decreases with the increase of the background dataset size. Also, we notice an
U-shape in the stability assessment of SHAP variable rankings, demonstrating
that SHAP is more reliable in ranking the most and least important variables
compared to moderately important ones. Overall, our results suggest that users
should take into account how background data affects SHAP results, with
improved SHAP stability as the background sample size increases.
Related papers
- In Shift and In Variance: Assessing the Robustness of HAR Deep Learning Models against Variability [4.330123738563178]
Human Activity Recognition (HAR) using wearable inertial measurement unit (IMU) sensors can revolutionize healthcare by enabling continual health monitoring, disease prediction, and routine recognition.
Despite the high accuracy of Deep Learning (DL) HAR models, their robustness to real-world variabilities remains untested.
We isolate subject, device, position, and orientation variability to determine their effect on DL HAR models and assess the robustness of these models in real-world conditions.
arXiv Detail & Related papers (2025-03-14T14:53:56Z) - Federated Learning with Sample-level Client Drift Mitigation [15.248811557566128]
Federated Learning suffers from severe performance degradation due to data heterogeneity among clients.
We propose FedBSS that first mitigates the heterogeneity issue in a sample-level manner.
We also achieved effective results on feature distribution and noise label dataset setting.
arXiv Detail & Related papers (2025-01-20T09:44:07Z) - A recursive Bayesian neural network for constitutive modeling of sands under monotonic loading [0.0]
In geotechnical engineering, models play a crucial role in describing soil behavior under varying loading conditions.
Data-driven deep learning (DL) models offer a promising alternative for developing predictive models.
When prediction is the primary focus, quantifying the predictive uncertainty of a trained DL model is crucial for informed decision-making.
arXiv Detail & Related papers (2025-01-17T10:15:03Z) - Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data [39.40116554523575]
We present Drift-Resilient TabPFN, a fresh approach based on In-Context Learning with a Prior-Data Fitted Network.
It learns to approximate Bayesian inference on synthetic datasets drawn from a prior.
It improves accuracy from 0.688 to 0.744 and ROC AUC from 0.786 to 0.832 while maintaining stronger calibration.
arXiv Detail & Related papers (2024-11-15T23:49:23Z) - Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - A Sparsity Principle for Partially Observable Causal Representation Learning [28.25303444099773]
Causal representation learning aims at identifying high-level causal variables from perceptual data.
We focus on learning from unpaired observations from a dataset with an instance-dependent partial observability pattern.
We propose two methods for estimating the underlying causal variables by enforcing sparsity in the inferred representation.
arXiv Detail & Related papers (2024-03-13T08:40:49Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - Discovering and Explaining the Non-Causality of Deep Learning in SAR ATR [20.662652637190515]
Deep learning has been widely used in SAR ATR and achieved excellent performance on the MSTAR dataset.
In this paper, we quantify the contributions of different regions to target recognition based on the Shapley value.
We explain how data bias and model bias contribute to non-causality.
arXiv Detail & Related papers (2023-04-03T00:45:11Z) - Scale-Equivalent Distillation for Semi-Supervised Object Detection [57.59525453301374]
Recent Semi-Supervised Object Detection (SS-OD) methods are mainly based on self-training, generating hard pseudo-labels by a teacher model on unlabeled data as supervisory signals.
We analyze the challenges these methods meet with the empirical experiment results.
We introduce a novel approach, Scale-Equivalent Distillation (SED), which is a simple yet effective end-to-end knowledge distillation framework robust to large object size variance and class imbalance.
arXiv Detail & Related papers (2022-03-23T07:33:37Z) - Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning.
Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch.
ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z) - Meta Learning for Causal Direction [29.00522306460408]
We introduce a novel generative model that allows distinguishing cause and effect in the small data setting.
We demonstrate our method on various synthetic as well as real-world data and show that it is able to maintain high accuracy in detecting directions across varying dataset sizes.
arXiv Detail & Related papers (2020-07-06T15:12:05Z) - Influence Functions in Deep Learning Are Fragile [52.31375893260445]
influence functions approximate the effect of samples in test-time predictions.
influence estimates are fairly accurate for shallow networks.
Hessian regularization is important to get highquality influence estimates.
arXiv Detail & Related papers (2020-06-25T18:25:59Z) - Differentially Private ERM Based on Data Perturbation [41.37436071802578]
We measure the contributions of various training data instances on the final machine learning model.
Considering that the key of our method is to measure each data instance separately, we propose a new Data perturbation' based (DB) paradigm for DP-ERM.
arXiv Detail & Related papers (2020-02-20T06:05:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.