S-LIME: Stabilized-LIME for Model Explanation
- URL: http://arxiv.org/abs/2106.07875v1
- Date: Tue, 15 Jun 2021 04:24:59 GMT
- Title: S-LIME: Stabilized-LIME for Model Explanation
- Authors: Zhengze Zhou, Giles Hooker, Fei Wang
- Abstract summary: Post hoc explanations based on perturbations are widely used approaches to interpret a machine learning model after it has been built.
We propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation.
- Score: 7.479279851480736
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An increasing number of machine learning models have been deployed in domains
with high stakes such as finance and healthcare. Despite their superior
performances, many models are black boxes in nature which are hard to explain.
There are growing efforts for researchers to develop methods to interpret these
black-box models. Post hoc explanations based on perturbations, such as LIME,
are widely used approaches to interpret a machine learning model after it has
been built. This class of methods has been shown to exhibit large instability,
posing serious challenges to the effectiveness of the method itself and harming
user trust. In this paper, we propose S-LIME, which utilizes a hypothesis
testing framework based on central limit theorem for determining the number of
perturbation points needed to guarantee stability of the resulting explanation.
Experiments on both simulated and real world data sets are provided to
demonstrate the effectiveness of our method.
Related papers
- Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Unified Explanations in Machine Learning Models: A Perturbation Approach [0.0]
Inconsistencies between XAI and modeling techniques can have the undesirable effect of casting doubt upon the efficacy of these explainability approaches.
We propose a systematic, perturbation-based analysis against a popular, model-agnostic method in XAI, SHapley Additive exPlanations (Shap)
We devise algorithms to generate relative feature importance in settings of dynamic inference amongst a suite of popular machine learning and deep learning methods, and metrics that allow us to quantify how well explanations generated under the static case hold.
arXiv Detail & Related papers (2024-05-30T16:04:35Z) - Advancing the Robustness of Large Language Models through Self-Denoised Smoothing [50.54276872204319]
Large language models (LLMs) have achieved significant success, but their vulnerability to adversarial perturbations has raised considerable concerns.
We propose to leverage the multitasking nature of LLMs to first denoise the noisy inputs and then to make predictions based on these denoised versions.
Unlike previous denoised smoothing techniques in computer vision, which require training a separate model to enhance the robustness of LLMs, our method offers significantly better efficiency and flexibility.
arXiv Detail & Related papers (2024-04-18T15:47:00Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - Interpretation of Black Box NLP Models: A Survey [0.0]
Post hoc explanations based on perturbations are widely used approaches to interpret a machine learning model after it has been built.
We propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation.
arXiv Detail & Related papers (2022-03-31T14:54:35Z) - Multicriteria interpretability driven Deep Learning [0.0]
Deep Learning methods are renowned for their performances, yet their lack of interpretability prevents them from high-stakes contexts.
Recent model methods address this problem by providing post-hoc interpretability methods by reverse-engineering the model's inner workings.
We propose a Multicriteria agnostic technique that allows to control the feature effects on the model's outcome by injecting knowledge in the objective function.
arXiv Detail & Related papers (2021-11-28T09:41:13Z) - Hessian-based toolbox for reliable and interpretable machine learning in
physics [58.720142291102135]
We present a toolbox for interpretability and reliability, extrapolation of the model architecture.
It provides a notion of the influence of the input data on the prediction at a given test point, an estimation of the uncertainty of the model predictions, and an agnostic score for the model predictions.
Our work opens the road to the systematic use of interpretability and reliability methods in ML applied to physics and, more generally, science.
arXiv Detail & Related papers (2021-08-04T16:32:59Z) - Recurrence-Aware Long-Term Cognitive Network for Explainable Pattern
Classification [0.0]
We propose an LTCN-based model for interpretable pattern classification of structured data.
Our method brings its own mechanism for providing explanations by quantifying the relevance of each feature in the decision process.
Our interpretable model obtains competitive performance when compared to the state-of-the-art white and black boxes.
arXiv Detail & Related papers (2021-07-07T18:14:50Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Explainable Matrix -- Visualization for Global and Local
Interpretability of Random Forest Classification Ensembles [78.6363825307044]
We propose Explainable Matrix (ExMatrix), a novel visualization method for Random Forest (RF) interpretability.
It employs a simple yet powerful matrix-like visual metaphor, where rows are rules, columns are features, and cells are rules predicates.
ExMatrix applicability is confirmed via different examples, showing how it can be used in practice to promote RF models interpretability.
arXiv Detail & Related papers (2020-05-08T21:03:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.