Explaining black boxes with a SMILE: Statistical Model-agnostic
Interpretability with Local Explanations
- URL: http://arxiv.org/abs/2311.07286v1
- Date: Mon, 13 Nov 2023 12:28:00 GMT
- Title: Explaining black boxes with a SMILE: Statistical Model-agnostic
Interpretability with Local Explanations
- Authors: Koorosh Aslansefat, Mojgan Hashemian, Martin Walker, Mohammed Naveed
Akram, Ioannis Sorokos, Yiannis Papadopoulos
- Abstract summary: One of the major barriers to widespread acceptance of machine learning (ML) is trustworthiness.
Most ML models operate as black boxes, their inner workings opaque and mysterious, and it can be difficult to trust their conclusions without understanding how those conclusions are reached.
We propose SMILE, a new method that builds on previous approaches by making use of statistical distance measures to improve explainability.
- Score: 0.1398098625978622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning is currently undergoing an explosion in capability,
popularity, and sophistication. However, one of the major barriers to
widespread acceptance of machine learning (ML) is trustworthiness: most ML
models operate as black boxes, their inner workings opaque and mysterious, and
it can be difficult to trust their conclusions without understanding how those
conclusions are reached. Explainability is therefore a key aspect of improving
trustworthiness: the ability to better understand, interpret, and anticipate
the behaviour of ML models. To this end, we propose SMILE, a new method that
builds on previous approaches by making use of statistical distance measures to
improve explainability while remaining applicable to a wide range of input data
domains.
Related papers
- Predicting the Performance of Black-box LLMs through Self-Queries [60.87193950962585]
Large language models (LLMs) are increasingly relied on in AI systems, predicting when they make mistakes is crucial.
In this paper, we extract features of LLMs in a black-box manner by using follow-up prompts and taking the probabilities of different responses as representations.
We demonstrate that training a linear model on these low-dimensional representations produces reliable predictors of model performance at the instance level.
arXiv Detail & Related papers (2025-01-02T22:26:54Z) - Drawing the Line: Enhancing Trustworthiness of MLLMs Through the Power of Refusal [21.342265570934995]
Existing methods have largely overlooked the importance of refusal responses as a means of enhancing MLLMs reliability.
We present the Information Boundary-aware Learning Framework (InBoL), a novel approach that empowers MLLMs to refuse to answer user queries when encountering insufficient information.
This framework introduces a comprehensive data generation pipeline and tailored training strategies to improve the model's ability to deliver appropriate refusal responses.
arXiv Detail & Related papers (2024-12-15T14:17:14Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - Accountable and Explainable Methods for Complex Reasoning over Text [5.571369922847262]
Accountability and transparency of Machine Learning models have been posed as critical desiderata by works in policy and law, philosophy, and computer science.
This thesis expands our collective knowledge in the areas of accountability and transparency of ML models developed for complex reasoning tasks over text.
arXiv Detail & Related papers (2022-11-09T15:14:52Z) - Shapelet-Based Counterfactual Explanations for Multivariate Time Series [0.9990687944474738]
We develop a model agnostic multivariate time series (MTS) counterfactual explanation algorithm.
We test our approach on a real-life solar flare prediction dataset and prove that our approach produces high-quality counterfactuals.
In addition to being visually interpretable, our explanations are superior in terms of proximity, sparsity, and plausibility.
arXiv Detail & Related papers (2022-08-22T17:33:31Z) - Interpretation of Black Box NLP Models: A Survey [0.0]
Post hoc explanations based on perturbations are widely used approaches to interpret a machine learning model after it has been built.
We propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation.
arXiv Detail & Related papers (2022-03-31T14:54:35Z) - Hessian-based toolbox for reliable and interpretable machine learning in
physics [58.720142291102135]
We present a toolbox for interpretability and reliability, extrapolation of the model architecture.
It provides a notion of the influence of the input data on the prediction at a given test point, an estimation of the uncertainty of the model predictions, and an agnostic score for the model predictions.
Our work opens the road to the systematic use of interpretability and reliability methods in ML applied to physics and, more generally, science.
arXiv Detail & Related papers (2021-08-04T16:32:59Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z) - Transfer Learning without Knowing: Reprogramming Black-box Machine
Learning Models with Scarce Data and Limited Resources [78.72922528736011]
We propose a novel approach, black-box adversarial reprogramming (BAR), that repurposes a well-trained black-box machine learning model.
Using zeroth order optimization and multi-label mapping techniques, BAR can reprogram a black-box ML model solely based on its input-output responses.
BAR outperforms state-of-the-art methods and yields comparable performance to the vanilla adversarial reprogramming method.
arXiv Detail & Related papers (2020-07-17T01:52:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.