Related papers: S-LIME: Stabilized-LIME for Model Explanation

S-LIME: Stabilized-LIME for Model Explanation

URL: http://arxiv.org/abs/2106.07875v1
Date: Tue, 15 Jun 2021 04:24:59 GMT
Title: S-LIME: Stabilized-LIME for Model Explanation
Authors: Zhengze Zhou, Giles Hooker, Fei Wang
Abstract summary: Post hoc explanations based on perturbations are widely used approaches to interpret a machine learning model after it has been built. We propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation.
Score: 7.479279851480736
License: http://creativecommons.org/licenses/by/4.0/
Abstract: An increasing number of machine learning models have been deployed in domains with high stakes such as finance and healthcare. Despite their superior performances, many models are black boxes in nature which are hard to explain. There are growing efforts for researchers to develop methods to interpret these black-box models. Post hoc explanations based on perturbations, such as LIME, are widely used approaches to interpret a machine learning model after it has been built. This class of methods has been shown to exhibit large instability, posing serious challenges to the effectiveness of the method itself and harming user trust. In this paper, we propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation. Experiments on both simulated and real world data sets are provided to demonstrate the effectiveness of our method.

Related papers

Investigating the Duality of Interpretability and Explainability in Machine Learning [2.8311451575532156]
Complex "black box" models exhibit exceptional predictive performance. Their inherently opaque nature raises concerns about transparency and interpretability. Efforts are focused on explaining these models instead of developing ones that are inherently interpretable.
arXiv Detail & Related papers (2025-03-27T10:48:40Z)
Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective. The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning. The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
Unified Explanations in Machine Learning Models: A Perturbation Approach [0.0]
Inconsistencies between XAI and modeling techniques can have the undesirable effect of casting doubt upon the efficacy of these explainability approaches. We propose a systematic, perturbation-based analysis against a popular, model-agnostic method in XAI, SHapley Additive exPlanations (Shap) We devise algorithms to generate relative feature importance in settings of dynamic inference amongst a suite of popular machine learning and deep learning methods, and metrics that allow us to quantify how well explanations generated under the static case hold.
arXiv Detail & Related papers (2024-05-30T16:04:35Z)
Advancing the Robustness of Large Language Models through Self-Denoised Smoothing [50.54276872204319]
Large language models (LLMs) have achieved significant success, but their vulnerability to adversarial perturbations has raised considerable concerns. We propose to leverage the multitasking nature of LLMs to first denoise the noisy inputs and then to make predictions based on these denoised versions. Unlike previous denoised smoothing techniques in computer vision, which require training a separate model to enhance the robustness of LLMs, our method offers significantly better efficiency and flexibility.
arXiv Detail & Related papers (2024-04-18T15:47:00Z)
MACE: An Efficient Model-Agnostic Framework for Counterfactual Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE) In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity. Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z)
Interpretation of Black Box NLP Models: A Survey [0.0]
Post hoc explanations based on perturbations are widely used approaches to interpret a machine learning model after it has been built. We propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation.
arXiv Detail & Related papers (2022-03-31T14:54:35Z)
Multicriteria interpretability driven Deep Learning [0.0]
Deep Learning methods are renowned for their performances, yet their lack of interpretability prevents them from high-stakes contexts. Recent model methods address this problem by providing post-hoc interpretability methods by reverse-engineering the model's inner workings. We propose a Multicriteria agnostic technique that allows to control the feature effects on the model's outcome by injecting knowledge in the objective function.
arXiv Detail & Related papers (2021-11-28T09:41:13Z)
Hessian-based toolbox for reliable and interpretable machine learning in physics [58.720142291102135]
We present a toolbox for interpretability and reliability, extrapolation of the model architecture. It provides a notion of the influence of the input data on the prediction at a given test point, an estimation of the uncertainty of the model predictions, and an agnostic score for the model predictions. Our work opens the road to the systematic use of interpretability and reliability methods in ML applied to physics and, more generally, science.
arXiv Detail & Related papers (2021-08-04T16:32:59Z)
Recurrence-Aware Long-Term Cognitive Network for Explainable Pattern Classification [0.0]
We propose an LTCN-based model for interpretable pattern classification of structured data. Our method brings its own mechanism for providing explanations by quantifying the relevance of each feature in the decision process. Our interpretable model obtains competitive performance when compared to the state-of-the-art white and black boxes.
arXiv Detail & Related papers (2021-07-07T18:14:50Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
Explainable Matrix -- Visualization for Global and Local Interpretability of Random Forest Classification Ensembles [78.6363825307044]
We propose Explainable Matrix (ExMatrix), a novel visualization method for Random Forest (RF) interpretability. It employs a simple yet powerful matrix-like visual metaphor, where rows are rules, columns are features, and cells are rules predicates. ExMatrix applicability is confirmed via different examples, showing how it can be used in practice to promote RF models interpretability.
arXiv Detail & Related papers (2020-05-08T21:03:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.