SEAL : Interactive Tool for Systematic Error Analysis and Labeling
- URL: http://arxiv.org/abs/2210.05839v1
- Date: Tue, 11 Oct 2022 23:51:44 GMT
- Title: SEAL : Interactive Tool for Systematic Error Analysis and Labeling
- Authors: Nazneen Rajani, Weixin Liang, Lingjiao Chen, Meg Mitchell, James Zou
- Abstract summary: This paper introduces an interactive Systematic Error Analysis and Labeling (seal) tool.
It uses a two-step approach to first identify high error slices of data and then, in the second step, introduce methods to give human-understandable semantics to those underperforming slices.
We explore a variety of methods for coming up with coherent semantics for the error groups using language models for semantic labeling and a text-to-image model for generating visual features.
- Score: 26.803598323167382
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the advent of Transformers, large language models (LLMs) have saturated
well-known NLP benchmarks and leaderboards with high aggregate performance.
However, many times these models systematically fail on tail data or rare
groups not obvious in aggregate evaluation. Identifying such problematic data
groups is even more challenging when there are no explicit labels (e.g.,
ethnicity, gender, etc.) and further compounded for NLP datasets due to the
lack of visual features to characterize failure modes (e.g., Asian males,
animals indoors, waterbirds on land, etc.). This paper introduces an
interactive Systematic Error Analysis and Labeling (\seal) tool that uses a
two-step approach to first identify high error slices of data and then, in the
second step, introduce methods to give human-understandable semantics to those
underperforming slices. We explore a variety of methods for coming up with
coherent semantics for the error groups using language models for semantic
labeling and a text-to-image model for generating visual features. SEAL toolkit
and demo screencast is available at https://huggingface.co/spaces/nazneen/seal.
Related papers
- AttributionScanner: A Visual Analytics System for Model Validation with Metadata-Free Slice Finding [29.07617945233152]
Data slice finding is an emerging technique for validating machine learning (ML) models by identifying and analyzing subgroups in a dataset that exhibit poor performance.
This approach faces significant challenges, including the laborious and costly requirement for additional metadata.
We introduce AttributionScanner, an innovative human-in-the-loop Visual Analytics (VA) system, designed for metadata-free data slice finding.
Our system identifies interpretable data slices that involve common model behaviors and visualizes these patterns through an Attribution Mosaic design.
arXiv Detail & Related papers (2024-01-12T09:17:32Z) - Annotating and Detecting Fine-grained Factual Errors for Dialogue
Summarization [34.85353544844499]
We present the first dataset with fine-grained factual error annotations named DIASUMFACT.
We define fine-grained factual error detection as a sentence-level multi-label classification problem.
We propose an unsupervised model ENDERANKER via candidate ranking using pretrained encoder-decoder models.
arXiv Detail & Related papers (2023-05-26T00:18:33Z) - Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - Adaptive Testing of Computer Vision Models [22.213542525825144]
We introduce AdaVision, an interactive process for testing vision models which helps users identify and fix coherent failure modes.
We demonstrate the usefulness and generality of AdaVision in user studies, where users find major bugs in state-of-the-art classification, object detection, and image captioning models.
arXiv Detail & Related papers (2022-12-06T05:52:31Z) - Discover, Explanation, Improvement: An Automatic Slice Detection
Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints.
This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks.
Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z) - Finding Dataset Shortcuts with Grammar Induction [85.47127659108637]
We propose to use probabilistic grammars to characterize and discover shortcuts in NLP datasets.
Specifically, we use a context-free grammar to model patterns in sentence classification datasets and use a synchronous context-free grammar to model datasets involving sentence pairs.
The resulting grammars reveal interesting shortcut features in a number of datasets, including both simple and high-level features.
arXiv Detail & Related papers (2022-10-20T19:54:11Z) - Active Learning by Feature Mixing [52.16150629234465]
We propose a novel method for batch active learning called ALFA-Mix.
We identify unlabelled instances with sufficiently-distinct features by seeking inconsistencies in predictions.
We show that inconsistencies in these predictions help discovering features that the model is unable to recognise in the unlabelled instances.
arXiv Detail & Related papers (2022-03-14T12:20:54Z) - Causal Scene BERT: Improving object detection by searching for
challenging groups of data [125.40669814080047]
Computer vision applications rely on learning-based perception modules parameterized with neural networks for tasks like object detection.
These modules frequently have low expected error overall but high error on atypical groups of data due to biases inherent in the training process.
Our main contribution is a pseudo-automatic method to discover such groups in foresight by performing causal interventions on simulated scenes.
arXiv Detail & Related papers (2022-02-08T05:14:16Z) - ZeroBERTo -- Leveraging Zero-Shot Text Classification by Topic Modeling [57.80052276304937]
This paper proposes a new model, ZeroBERTo, which leverages an unsupervised clustering step to obtain a compressed data representation before the classification task.
We show that ZeroBERTo has better performance for long inputs and shorter execution time, outperforming XLM-R by about 12% in the F1 score in the FolhaUOL dataset.
arXiv Detail & Related papers (2022-01-04T20:08:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.