Label-Descriptive Patterns and their Application to Characterizing
Classification Errors
- URL: http://arxiv.org/abs/2110.09599v1
- Date: Mon, 18 Oct 2021 19:42:21 GMT
- Title: Label-Descriptive Patterns and their Application to Characterizing
Classification Errors
- Authors: Michael Hedderich, Jonas Fischer, Dietrich Klakow and Jilles Vreeken
- Abstract summary: State-of-the-art deep learning methods achieve human-like performance on many tasks, but make errors nevertheless.
Characterizing these errors in easily interpretable terms gives insight into whether a model is prone to making systematic errors, but also gives a way to act and improve the model.
In this paper we propose a method that allows us to do so for arbitrary classifiers by mining a small set of patterns that together succinctly describe the input data that is partitioned according to correctness of prediction.
- Score: 31.272875287136426
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art deep learning methods achieve human-like performance on many
tasks, but make errors nevertheless. Characterizing these errors in easily
interpretable terms gives insight into whether a model is prone to making
systematic errors, but also gives a way to act and improve the model. In this
paper we propose a method that allows us to do so for arbitrary classifiers by
mining a small set of patterns that together succinctly describe the input data
that is partitioned according to correctness of prediction. We show this is an
instance of the more general label description problem, which we formulate in
terms of the Minimum Description Length principle. To discover good pattern
sets we propose the efficient and hyperparameter-free Premise algorithm, which
through an extensive set of experiments we show on both synthetic and
real-world data performs very well in practice; unlike existing solutions it
ably recovers ground truth patterns, even on highly imbalanced data over many
unique items, or where patterns are only weakly associated to labels. Through
two real-world case studies we confirm that Premise gives clear and actionable
insight into the systematic errors made by modern NLP classifiers.
Related papers
- Understanding and Mitigating Classification Errors Through Interpretable
Token Patterns [58.91023283103762]
Characterizing errors in easily interpretable terms gives insight into whether a classifier is prone to making systematic errors.
We propose to discover those patterns of tokens that distinguish correct and erroneous predictions.
We show that our method, Premise, performs well in practice.
arXiv Detail & Related papers (2023-11-18T00:24:26Z) - Understanding prompt engineering may not require rethinking
generalization [56.38207873589642]
We show that the discrete nature of prompts, combined with a PAC-Bayes prior given by a language model, results in generalization bounds that are remarkably tight by the standards of the literature.
This work provides a possible justification for the widespread practice of prompt engineering.
arXiv Detail & Related papers (2023-10-06T00:52:48Z) - Finding Dataset Shortcuts with Grammar Induction [85.47127659108637]
We propose to use probabilistic grammars to characterize and discover shortcuts in NLP datasets.
Specifically, we use a context-free grammar to model patterns in sentence classification datasets and use a synchronous context-free grammar to model datasets involving sentence pairs.
The resulting grammars reveal interesting shortcut features in a number of datasets, including both simple and high-level features.
arXiv Detail & Related papers (2022-10-20T19:54:11Z) - Prototype-Anchored Learning for Learning with Imperfect Annotations [83.7763875464011]
It is challenging to learn unbiased classification models from imperfectly annotated datasets.
We propose a prototype-anchored learning (PAL) method, which can be easily incorporated into various learning-based classification schemes.
We verify the effectiveness of PAL on class-imbalanced learning and noise-tolerant learning by extensive experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2022-06-23T10:25:37Z) - Understanding Factual Errors in Summarization: Errors, Summarizers,
Datasets, Error Detectors [105.12462629663757]
In this work, we aggregate factuality error annotations from nine existing datasets and stratify them according to the underlying summarization model.
We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models.
arXiv Detail & Related papers (2022-05-25T15:26:48Z) - Annotating and Modeling Fine-grained Factuality in Summarization [36.88018450067003]
A major barrier to their use in practice is their propensity to output summaries that are not faithful to the input and that contain factual errors.
We explore both synthetic and human-labeled data sources for training models to identify factual errors in summarization.
We show that our best factuality detection model enables training of more factual XSum summarization models by allowing us to identify non-factual tokens in the training data.
arXiv Detail & Related papers (2021-04-09T11:20:44Z) - Identifying Wrongly Predicted Samples: A Method for Active Learning [6.976600214375139]
We propose a simple sample selection criterion that moves beyond uncertainty.
We show state-of-the-art results and better rates at identifying wrongly predicted samples.
arXiv Detail & Related papers (2020-10-14T09:00:42Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.