Learning with Explanation Constraints
- URL: http://arxiv.org/abs/2303.14496v3
- Date: Fri, 22 Dec 2023 23:52:29 GMT
- Title: Learning with Explanation Constraints
- Authors: Rattana Pukdee, Dylan Sam, J. Zico Kolter, Maria-Florina Balcan,
Pradeep Ravikumar
- Abstract summary: We provide a learning theoretic framework to analyze how explanations can improve the learning of our models.
We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments.
- Score: 91.23736536228485
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As larger deep learning models are hard to interpret, there has been a recent
focus on generating explanations of these black-box models. In contrast, we may
have apriori explanations of how models should behave. In this paper, we
formalize this notion as learning from explanation constraints and provide a
learning theoretic framework to analyze how such explanations can improve the
learning of our models. One may naturally ask, "When would these explanations
be helpful?" Our first key contribution addresses this question via a class of
models that satisfies these explanation constraints in expectation over new
data. We provide a characterization of the benefits of these models (in terms
of the reduction of their Rademacher complexities) for a canonical class of
explanations given by gradient information in the settings of both linear
models and two layer neural networks. In addition, we provide an algorithmic
solution for our framework, via a variational approximation that achieves
better performance and satisfies these constraints more frequently, when
compared to simpler augmented Lagrangian methods to incorporate these
explanations. We demonstrate the benefits of our approach over a large array of
synthetic and real-world experiments.
Related papers
- Selective Explanations [14.312717332216073]
A machine learning model is trained to predict feature attribution scores with only one inference.
Despite their efficiency, amortized explainers can produce inaccurate predictions and misleading explanations.
We propose selective explanations, a novel feature attribution method that detects when amortized explainers generate low-quality explanations.
arXiv Detail & Related papers (2024-05-29T23:08:31Z) - Explainability for Machine Learning Models: From Data Adaptability to
User Perception [0.8702432681310401]
This thesis explores the generation of local explanations for already deployed machine learning models.
It aims to identify optimal conditions for producing meaningful explanations considering both data and user requirements.
arXiv Detail & Related papers (2024-02-16T18:44:37Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - Explaining Explainability: Towards Deeper Actionable Insights into Deep
Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level.
We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z) - Learning to Scaffold: Optimizing Model Explanations for Teaching [74.25464914078826]
We train models on three natural language processing and computer vision tasks.
We find that students trained with explanations extracted with our framework are able to simulate the teacher significantly more effectively than ones produced with previous methods.
arXiv Detail & Related papers (2022-04-22T16:43:39Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Deducing neighborhoods of classes from a fitted model [68.8204255655161]
In this article a new kind of interpretable machine learning method is presented.
It can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts.
Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed.
arXiv Detail & Related papers (2020-09-11T16:35:53Z) - Accurate and Intuitive Contextual Explanations using Linear Model Trees [0.0]
Local post hoc model explanations have gained massive adoption.
Current state of the art methods use rudimentary methods to generate synthetic data around the point to be explained.
We use a Generative Adversarial Network for synthetic data generation and train a piecewise linear model in the form of Linear Model Trees.
arXiv Detail & Related papers (2020-09-11T10:13:12Z) - Explainable Deep Modeling of Tabular Data using TableGraphNet [1.376408511310322]
We propose a new architecture that produces explainable predictions in the form of additive feature attributions.
We show that our explainable model attains the same level of performance as black box models.
arXiv Detail & Related papers (2020-02-12T20:02:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.