Learning to Scaffold: Optimizing Model Explanations for Teaching
- URL: http://arxiv.org/abs/2204.10810v1
- Date: Fri, 22 Apr 2022 16:43:39 GMT
- Title: Learning to Scaffold: Optimizing Model Explanations for Teaching
- Authors: Patrick Fernandes, Marcos Treviso, Danish Pruthi, Andr\'e F. T.
Martins, Graham Neubig
- Abstract summary: We train models on three natural language processing and computer vision tasks.
We find that students trained with explanations extracted with our framework are able to simulate the teacher significantly more effectively than ones produced with previous methods.
- Score: 74.25464914078826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern machine learning models are opaque, and as a result there is a
burgeoning academic subfield on methods that explain these models' behavior.
However, what is the precise goal of providing such explanations, and how can
we demonstrate that explanations achieve this goal? Some research argues that
explanations should help teach a student (either human or machine) to simulate
the model being explained, and that the quality of explanations can be measured
by the simulation accuracy of students on unexplained examples. In this work,
leveraging meta-learning techniques, we extend this idea to improve the quality
of the explanations themselves, specifically by optimizing explanations such
that student models more effectively learn to simulate the original model. We
train models on three natural language processing and computer vision tasks,
and find that students trained with explanations extracted with our framework
are able to simulate the teacher significantly more effectively than ones
produced with previous methods. Through human annotations and a user study, we
further find that these learned explanations more closely align with how humans
would explain the required decisions in these tasks. Our code is available at
https://github.com/coderpat/learning-scaffold
Related papers
- Explainability for Machine Learning Models: From Data Adaptability to
User Perception [0.8702432681310401]
This thesis explores the generation of local explanations for already deployed machine learning models.
It aims to identify optimal conditions for producing meaningful explanations considering both data and user requirements.
arXiv Detail & Related papers (2024-02-16T18:44:37Z) - Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development.
To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps.
These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
arXiv Detail & Related papers (2023-12-10T23:13:23Z) - Can I Trust the Explanations? Investigating Explainable Machine Learning
Methods for Monotonic Models [0.0]
Most explainable machine learning methods are applied to black-box models without any domain knowledge.
By incorporating domain knowledge, science-informed machine learning models have demonstrated better generalization and interpretation.
arXiv Detail & Related papers (2023-09-23T03:59:02Z) - Learning with Explanation Constraints [91.23736536228485]
We provide a learning theoretic framework to analyze how explanations can improve the learning of our models.
We demonstrate the benefits of our approach over a large array of synthetic and real-world experiments.
arXiv Detail & Related papers (2023-03-25T15:06:47Z) - Counterfactual Explanations for Models of Code [11.678590247866534]
Machine learning (ML) models play an increasingly prevalent role in many software engineering tasks.
It can be difficult for developers to understand why the model came to a certain conclusion and how to act upon the model's prediction.
This paper explores counterfactual explanations for models of source code.
arXiv Detail & Related papers (2021-11-10T14:44:19Z) - Evaluating Explanations: How much do explanations from the teacher aid
students? [103.05037537415811]
We formalize the value of explanations using a student-teacher paradigm that measures the extent to which explanations improve student models in learning.
Unlike many prior proposals to evaluate explanations, our approach cannot be easily gamed, enabling principled, scalable, and automatic evaluation of attributions.
arXiv Detail & Related papers (2020-12-01T23:40:21Z) - Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial
Explanations of Their Behavior in Natural Language? [86.60613602337246]
We introduce a leakage-adjusted simulatability (LAS) metric for evaluating NL explanations.
LAS measures how well explanations help an observer predict a model's output, while controlling for how explanations can directly leak the output.
We frame explanation generation as a multi-agent game and optimize explanations for simulatability while penalizing label leakage.
arXiv Detail & Related papers (2020-10-08T16:59:07Z) - Model extraction from counterfactual explanations [68.8204255655161]
We show how an adversary can leverage the information provided by counterfactual explanations to build high-fidelity and high-accuracy model extraction attacks.
Our attack enables the adversary to build a faithful copy of a target model by accessing its counterfactual explanations.
arXiv Detail & Related papers (2020-09-03T19:02:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.