Defuse: Harnessing Unrestricted Adversarial Examples for Debugging
Models Beyond Test Accuracy
- URL: http://arxiv.org/abs/2102.06162v1
- Date: Thu, 11 Feb 2021 18:08:42 GMT
- Title: Defuse: Harnessing Unrestricted Adversarial Examples for Debugging
Models Beyond Test Accuracy
- Authors: Dylan Slack, Nathalie Rauschmayr, Krishnaram Kenthapadi
- Abstract summary: Defuse is a method to automatically discover and correct model errors beyond those available in test data.
We propose an algorithm inspired by adversarial machine learning techniques that uses a generative model to find naturally occurring instances misclassified by a model.
Defuse corrects the error after fine-tuning while maintaining generalization on the test set.
- Score: 11.265020351747916
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We typically compute aggregate statistics on held-out test data to assess the
generalization of machine learning models. However, statistics on test data
often overstate model generalization, and thus, the performance of deployed
machine learning models can be variable and untrustworthy. Motivated by these
concerns, we develop methods to automatically discover and correct model errors
beyond those available in the data. We propose Defuse, a method that generates
novel model misclassifications, categorizes these errors into high-level model
bugs, and efficiently labels and fine-tunes on the errors to correct them. To
generate misclassified data, we propose an algorithm inspired by adversarial
machine learning techniques that uses a generative model to find naturally
occurring instances misclassified by a model. Further, we observe that the
generative models have regions in their latent space with higher concentrations
of misclassifications. We call these regions misclassification regions and find
they have several useful properties. Each region contains a specific type of
model bug; for instance, a misclassification region for an MNIST classifier
contains a style of skinny 6 that the model mistakes as a 1. We can also assign
a single label to each region, facilitating low-cost labeling. We propose a
method to learn the misclassification regions and use this insight to both
categorize errors and correct them. In practice, Defuse finds and corrects
novel errors in classifiers. For example, Defuse shows that a high-performance
traffic sign classifier mistakes certain 50km/h signs as 80km/h. Defuse
corrects the error after fine-tuning while maintaining generalization on the
test set.
Related papers
- Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits! [51.668411293817464]
Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines.
Academic research is often restrained to public datasets on the order of ten thousand samples.
We devise an approach to generate a benchmark of difficulty from a pool of available samples.
arXiv Detail & Related papers (2023-12-25T21:25:55Z) - Probabilistic Safety Regions Via Finite Families of Scalable Classifiers [2.431537995108158]
Supervised classification recognizes patterns in the data to separate classes of behaviours.
Canonical solutions contain misclassification errors that are intrinsic to the numerical approximating nature of machine learning.
We introduce the concept of probabilistic safety region to describe a subset of the input space in which the number of misclassified instances is probabilistically controlled.
arXiv Detail & Related papers (2023-09-08T22:40:19Z) - Misclassification in Automated Content Analysis Causes Bias in
Regression. Can We Fix It? Yes We Can! [0.30693357740321775]
We show in a systematic literature review that communication scholars largely ignore misclassification bias.
Existing statistical methods can use "gold standard" validation data, such as that created by human annotators, to correct misclassification bias.
We introduce and test such methods, including a new method we design and implement in the R package misclassificationmodels.
arXiv Detail & Related papers (2023-07-12T23:03:55Z) - Towards Fine-Grained Information: Identifying the Type and Location of
Translation Errors [80.22825549235556]
Existing approaches can not synchronously consider error position and type.
We build an FG-TED model to predict the textbf addition and textbfomission errors.
Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results.
arXiv Detail & Related papers (2023-02-17T16:20:33Z) - Understanding Factual Errors in Summarization: Errors, Summarizers,
Datasets, Error Detectors [105.12462629663757]
In this work, we aggregate factuality error annotations from nine existing datasets and stratify them according to the underlying summarization model.
We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models.
arXiv Detail & Related papers (2022-05-25T15:26:48Z) - Evaluating State-of-the-Art Classification Models Against Bayes
Optimality [106.50867011164584]
We show that we can compute the exact Bayes error of generative models learned using normalizing flows.
We use our approach to conduct a thorough investigation of state-of-the-art classification models.
arXiv Detail & Related papers (2021-06-07T06:21:20Z) - Debugging Tests for Model Explanations [18.073554618753395]
Methods tested are able to diagnose a spurious background bug, but not conclusively identify mislabeled training examples.
We complement our analysis with a human subject study, and find that subjects fail to identify defective models using attributions, but instead rely, primarily, on model predictions.
arXiv Detail & Related papers (2020-11-10T22:23:25Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.