Digital Socrates: Evaluating LLMs through Explanation Critiques
- URL: http://arxiv.org/abs/2311.09613v2
- Date: Fri, 16 Feb 2024 08:49:07 GMT
- Title: Digital Socrates: Evaluating LLMs through Explanation Critiques
- Authors: Yuling Gu, Oyvind Tafjord, Peter Clark
- Abstract summary: Digital Socrates is an open-source, automatic critique model for model explanations.
We show how Digital Socrates is useful for revealing insights about student models by examining their reasoning chains.
- Score: 41.876046456171
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While LLMs can provide reasoned explanations along with their answers, the
nature and quality of those explanations are still poorly understood. In
response, our goal is to define a detailed way of characterizing the
explanation capabilities of modern models and to create a nuanced,
interpretable explanation evaluation tool that can generate such
characterizations automatically, without relying on expensive API calls or
human annotations. Our approach is to (a) define the new task of explanation
critiquing - identifying and categorizing any main flaw in an explanation and
providing suggestions to address the flaw, (b) create a sizeable,
human-verified dataset for this task, and (c) train an open-source, automatic
critique model (called Digital Socrates) using this data. Through quantitative
and qualitative analysis, we demonstrate how Digital Socrates is useful for
revealing insights about student models by examining their reasoning chains,
and how it can provide high-quality, nuanced, automatic evaluation of those
model explanations for the first time. Digital Socrates thus fills an important
gap in evaluation tools for understanding and improving the explanation
behavior of models.
Related papers
- Evaluating the Utility of Model Explanations for Model Development [54.23538543168767]
We evaluate whether explanations can improve human decision-making in practical scenarios of machine learning model development.
To our surprise, we did not find evidence of significant improvement on tasks when users were provided with any of the saliency maps.
These findings suggest caution regarding the usefulness and potential for misunderstanding in saliency-based explanations.
arXiv Detail & Related papers (2023-12-10T23:13:23Z) - Learning by Self-Explaining [23.420673675343266]
We introduce a novel approach in the context of image classification, termed Learning by Self-Explaining (LSX)
LSX utilizes aspects of self-refining AI and human-guided explanatory machine learning.
Our results indicate improvements via Learning by Self-Explaining on several levels.
arXiv Detail & Related papers (2023-09-15T13:41:57Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - Explaining Explainability: Towards Deeper Actionable Insights into Deep
Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level.
We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z) - ExSum: From Local Explanations to Model Understanding [6.23934576145261]
Interpretability methods are developed to understand the working mechanisms of black-box models.
Fulfilling this goal requires both that the explanations generated by these methods are correct and that people can easily and reliably understand them.
We introduce explanation summary (ExSum), a mathematical framework for quantifying model understanding.
arXiv Detail & Related papers (2022-04-30T02:07:20Z) - On the Objective Evaluation of Post Hoc Explainers [10.981508361941335]
Modern trends in machine learning research have led to algorithms that are increasingly intricate to the degree that they are considered to be black boxes.
In an effort to reduce the opacity of decisions, methods have been proposed to construe the inner workings of such models in a human-comprehensible manner.
We propose a framework for the evaluation of post hoc explainers on ground truth that is directly derived from the additive structure of a model.
arXiv Detail & Related papers (2021-06-15T19:06:51Z) - Evaluating Explanations: How much do explanations from the teacher aid
students? [103.05037537415811]
We formalize the value of explanations using a student-teacher paradigm that measures the extent to which explanations improve student models in learning.
Unlike many prior proposals to evaluate explanations, our approach cannot be easily gamed, enabling principled, scalable, and automatic evaluation of attributions.
arXiv Detail & Related papers (2020-12-01T23:40:21Z) - Evaluations and Methods for Explanation through Robustness Analysis [117.7235152610957]
We establish a novel set of evaluation criteria for such feature based explanations by analysis.
We obtain new explanations that are loosely necessary and sufficient for a prediction.
We extend the explanation to extract the set of features that would move the current prediction to a target class.
arXiv Detail & Related papers (2020-05-31T05:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.