Understanding Interpretability by generalized distillation in Supervised
Classification
- URL: http://arxiv.org/abs/2012.03089v1
- Date: Sat, 5 Dec 2020 17:42:50 GMT
- Title: Understanding Interpretability by generalized distillation in Supervised
Classification
- Authors: Adit Agarwal and Dr. K.K. Shukla and Arjan Kuijper and Anirban
Mukhopadhyay
- Abstract summary: Recent interpretation strategies focus on human understanding of the underlying decision mechanisms of the complex Machine Learning models.
We propose an interpretation-by-distillation formulation that is defined relative to other ML models.
We evaluate our proposed framework on the MNIST, Fashion-MNIST and Stanford40 datasets.
- Score: 3.5473853445215897
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to interpret decisions taken by Machine Learning (ML) models is
fundamental to encourage trust and reliability in different practical
applications. Recent interpretation strategies focus on human understanding of
the underlying decision mechanisms of the complex ML models. However, these
strategies are restricted by the subjective biases of humans. To dissociate
from such human biases, we propose an interpretation-by-distillation
formulation that is defined relative to other ML models. We generalize the
distillation technique for quantifying interpretability, using an
information-theoretic perspective, removing the role of ground-truth from the
definition of interpretability. Our work defines the entropy of supervised
classification models, providing bounds on the entropy of Piece-Wise Linear
Neural Networks (PWLNs), along with the first theoretical bounds on the
interpretability of PWLNs. We evaluate our proposed framework on the MNIST,
Fashion-MNIST and Stanford40 datasets and demonstrate the applicability of the
proposed theoretical framework in different supervised classification
scenarios.
Related papers
- Self-supervised Interpretable Concept-based Models for Text Classification [9.340843984411137]
This paper proposes a self-supervised Interpretable Concept Embedding Models (ICEMs)
We leverage the generalization abilities of Large-Language Models to predict the concepts labels in a self-supervised way.
ICEMs can be trained in a self-supervised way achieving similar performance to fully supervised concept-based models and end-to-end black-box ones.
arXiv Detail & Related papers (2024-06-20T14:04:53Z) - Learning Discrete Concepts in Latent Hierarchical Models [73.01229236386148]
Learning concepts from natural high-dimensional data holds potential in building human-aligned and interpretable machine learning models.
We formalize concepts as discrete latent causal variables that are related via a hierarchical causal model.
We substantiate our theoretical claims with synthetic data experiments.
arXiv Detail & Related papers (2024-06-01T18:01:03Z) - Sparsity-Guided Holistic Explanation for LLMs with Interpretable
Inference-Time Intervention [53.896974148579346]
Large Language Models (LLMs) have achieved unprecedented breakthroughs in various natural language processing domains.
The enigmatic black-box'' nature of LLMs remains a significant challenge for interpretability, hampering transparent and accountable applications.
We propose a novel methodology anchored in sparsity-guided techniques, aiming to provide a holistic interpretation of LLMs.
arXiv Detail & Related papers (2023-12-22T19:55:58Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - Scientific Inference With Interpretable Machine Learning: Analyzing Models to Learn About Real-World Phenomena [4.312340306206884]
Interpretable machine learning offers a solution by analyzing models holistically to derive interpretations.
Current IML research is focused on auditing ML models rather than leveraging them for scientific inference.
We present a framework for designing IML methods-termed 'property descriptors' that illuminate not just the model, but also the phenomenon it represents.
arXiv Detail & Related papers (2022-06-11T10:13:21Z) - On the Faithfulness Measurements for Model Interpretations [100.2730234575114]
Post-hoc interpretations aim to uncover how natural language processing (NLP) models make predictions.
To tackle these issues, we start with three criteria: the removal-based criterion, the sensitivity of interpretations, and the stability of interpretations.
Motivated by the desideratum of these faithfulness notions, we introduce a new class of interpretation methods that adopt techniques from the adversarial domain.
arXiv Detail & Related papers (2021-04-18T09:19:44Z) - General Pitfalls of Model-Agnostic Interpretation Methods for Machine
Learning Models [1.025459377812322]
We highlight many general pitfalls of machine learning model interpretation, such as using interpretation techniques in the wrong context.
We focus on pitfalls for global methods that describe the average model behavior, but many pitfalls also apply to local methods that explain individual predictions.
arXiv Detail & Related papers (2020-07-08T14:02:56Z) - Explainable Matrix -- Visualization for Global and Local
Interpretability of Random Forest Classification Ensembles [78.6363825307044]
We propose Explainable Matrix (ExMatrix), a novel visualization method for Random Forest (RF) interpretability.
It employs a simple yet powerful matrix-like visual metaphor, where rows are rules, columns are features, and cells are rules predicates.
ExMatrix applicability is confirmed via different examples, showing how it can be used in practice to promote RF models interpretability.
arXiv Detail & Related papers (2020-05-08T21:03:48Z) - Benchmarking Machine Reading Comprehension: A Psychological Perspective [45.85089157315507]
Machine reading comprehension (MRC) has received considerable attention as a benchmark for natural language understanding.
The conventional task design of MRC lacks explainability beyond the model interpretation.
This paper provides a theoretical basis for the design of MRC datasets based on psychology as well as psychometrics.
arXiv Detail & Related papers (2020-04-04T11:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.