Can I Trust the Explanations? Investigating Explainable Machine Learning
Methods for Monotonic Models
- URL: http://arxiv.org/abs/2309.13246v1
- Date: Sat, 23 Sep 2023 03:59:02 GMT
- Title: Can I Trust the Explanations? Investigating Explainable Machine Learning
Methods for Monotonic Models
- Authors: Dangxing Chen
- Abstract summary: Most explainable machine learning methods are applied to black-box models without any domain knowledge.
By incorporating domain knowledge, science-informed machine learning models have demonstrated better generalization and interpretation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, explainable machine learning methods have been very
successful. Despite their success, most explainable machine learning methods
are applied to black-box models without any domain knowledge. By incorporating
domain knowledge, science-informed machine learning models have demonstrated
better generalization and interpretation. But do we obtain consistent
scientific explanations if we apply explainable machine learning methods to
science-informed machine learning models? This question is addressed in the
context of monotonic models that exhibit three different types of monotonicity.
To demonstrate monotonicity, we propose three axioms. Accordingly, this study
shows that when only individual monotonicity is involved, the baseline Shapley
value provides good explanations; however, when strong pairwise monotonicity is
involved, the Integrated gradients method provides reasonable explanations on
average.
Related papers
- Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - Learning with Explanation Constraints [91.23736536228485]
We provide a learning theoretic framework to analyze how explanations can improve the learning of our models.
We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments.
arXiv Detail & Related papers (2023-03-25T15:06:47Z) - Understanding Post-hoc Explainers: The Case of Anchors [6.681943980068051]
We present a theoretical analysis of a rule-based interpretability method that highlights a small set of words to explain a text's decision.
After formalizing its algorithm and providing useful insights, we demonstrate mathematically that Anchors produces meaningful results.
arXiv Detail & Related papers (2023-03-15T17:56:34Z) - Learning to Scaffold: Optimizing Model Explanations for Teaching [74.25464914078826]
We train models on three natural language processing and computer vision tasks.
We find that students trained with explanations extracted with our framework are able to simulate the teacher significantly more effectively than ones produced with previous methods.
arXiv Detail & Related papers (2022-04-22T16:43:39Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Explaining Natural Language Processing Classifiers with Occlusion and
Language Modeling [4.9342793303029975]
We present a novel explanation method, called OLM, for natural language processing classifiers.
OLM gives explanations that are theoretically sound and easy to understand.
We make several contributions to the theory of explanation methods.
arXiv Detail & Related papers (2021-01-28T09:44:04Z) - Deducing neighborhoods of classes from a fitted model [68.8204255655161]
In this article a new kind of interpretable machine learning method is presented.
It can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts.
Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed.
arXiv Detail & Related papers (2020-09-11T16:35:53Z) - Counterfactual explanation of machine learning survival models [5.482532589225552]
It is shown that the counterfactual explanation problem can be reduced to a standard convex optimization problem with linear constraints.
For other black-box models, it is proposed to apply the well-known Particle Swarm Optimization algorithm.
arXiv Detail & Related papers (2020-06-26T19:46:47Z) - Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason
Over Implicit Knowledge [96.92252296244233]
Large pre-trained language models (LMs) acquire some reasoning capacity, but this ability is difficult to control.
We show that LMs can be trained to reliably perform systematic reasoning combining both implicit, pre-trained knowledge and explicit natural language statements.
Our work paves a path towards open-domain systems that constantly improve by interacting with users who can instantly correct a model by adding simple natural language statements.
arXiv Detail & Related papers (2020-06-11T17:02:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.