Related papers: Interpretation of Black Box NLP Models: A Survey

Interpretation of Black Box NLP Models: A Survey

URL: http://arxiv.org/abs/2203.17081v1
Date: Thu, 31 Mar 2022 14:54:35 GMT
Title: Interpretation of Black Box NLP Models: A Survey
Authors: Shivani Choudhary, Niladri Chatterjee, Subir Kumar Saha
Abstract summary: Post hoc explanations based on perturbations are widely used approaches to interpret a machine learning model after it has been built. We propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: An increasing number of machine learning models have been deployed in domains with high stakes such as finance and healthcare. Despite their superior performances, many models are black boxes in nature which are hard to explain. There are growing efforts for researchers to develop methods to interpret these black-box models. Post hoc explanations based on perturbations, such as LIME, are widely used approaches to interpret a machine learning model after it has been built. This class of methods has been shown to exhibit large instability, posing serious challenges to the effectiveness of the method itself and harming user trust. In this paper, we propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation. Experiments on both simulated and real world data sets are provided to demonstrate the effectiveness of our method.

Related papers

Investigating the Duality of Interpretability and Explainability in Machine Learning [2.8311451575532156]
Complex "black box" models exhibit exceptional predictive performance. Their inherently opaque nature raises concerns about transparency and interpretability. Efforts are focused on explaining these models instead of developing ones that are inherently interpretable.
arXiv Detail & Related papers (2025-03-27T10:48:40Z)
Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective. The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning. The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z)
MACE: An Efficient Model-Agnostic Framework for Counterfactual Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE) In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity. Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z)
Beyond Explaining: Opportunities and Challenges of XAI-Based Model Improvement [75.00655434905417]
Explainable Artificial Intelligence (XAI) is an emerging research field bringing transparency to highly complex machine learning (ML) models. This paper offers a comprehensive overview over techniques that apply XAI practically for improving various properties of ML models. We show empirically through experiments on toy and realistic settings how explanations can help improve properties such as model generalization ability or reasoning.
arXiv Detail & Related papers (2022-03-15T15:44:28Z)
Hessian-based toolbox for reliable and interpretable machine learning in physics [58.720142291102135]
We present a toolbox for interpretability and reliability, extrapolation of the model architecture. It provides a notion of the influence of the input data on the prediction at a given test point, an estimation of the uncertainty of the model predictions, and an agnostic score for the model predictions. Our work opens the road to the systematic use of interpretability and reliability methods in ML applied to physics and, more generally, science.
arXiv Detail & Related papers (2021-08-04T16:32:59Z)
Recurrence-Aware Long-Term Cognitive Network for Explainable Pattern Classification [0.0]
We propose an LTCN-based model for interpretable pattern classification of structured data. Our method brings its own mechanism for providing explanations by quantifying the relevance of each feature in the decision process. Our interpretable model obtains competitive performance when compared to the state-of-the-art white and black boxes.
arXiv Detail & Related papers (2021-07-07T18:14:50Z)
S-LIME: Stabilized-LIME for Model Explanation [7.479279851480736]
Post hoc explanations based on perturbations are widely used approaches to interpret a machine learning model after it has been built. We propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation.
arXiv Detail & Related papers (2021-06-15T04:24:59Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
Robust and Stable Black Box Explanations [31.05743211871823]
We propose a novel framework for generating robust and stable explanations of black box models. We instantiate this algorithm for explanations in the form of linear models and decision sets.
arXiv Detail & Related papers (2020-11-12T02:29:03Z)
MeLIME: Meaningful Local Explanation for Machine Learning Models [2.819725769698229]
We show that our approach, MeLIME, produces more meaningful explanations compared to other techniques over different ML models. MeLIME generalizes the LIME method, allowing more flexible perturbation sampling and the use of different local interpretable models.
arXiv Detail & Related papers (2020-09-12T16:06:58Z)
Model extraction from counterfactual explanations [68.8204255655161]
We show how an adversary can leverage the information provided by counterfactual explanations to build high-fidelity and high-accuracy model extraction attacks. Our attack enables the adversary to build a faithful copy of a target model by accessing its counterfactual explanations.
arXiv Detail & Related papers (2020-09-03T19:02:55Z)
Plausible Counterfactuals: Auditing Deep Learning Classifiers with Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data. Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model. Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.