Adaptive Testing of Computer Vision Models
- URL: http://arxiv.org/abs/2212.02774v2
- Date: Wed, 16 Aug 2023 18:25:28 GMT
- Title: Adaptive Testing of Computer Vision Models
- Authors: Irena Gao and Gabriel Ilharco and Scott Lundberg and Marco Tulio
Ribeiro
- Abstract summary: We introduce AdaVision, an interactive process for testing vision models which helps users identify and fix coherent failure modes.
We demonstrate the usefulness and generality of AdaVision in user studies, where users find major bugs in state-of-the-art classification, object detection, and image captioning models.
- Score: 22.213542525825144
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision models often fail systematically on groups of data that share common
semantic characteristics (e.g., rare objects or unusual scenes), but
identifying these failure modes is a challenge. We introduce AdaVision, an
interactive process for testing vision models which helps users identify and
fix coherent failure modes. Given a natural language description of a coherent
group, AdaVision retrieves relevant images from LAION-5B with CLIP. The user
then labels a small amount of data for model correctness, which is used in
successive retrieval rounds to hill-climb towards high-error regions, refining
the group definition. Once a group is saturated, AdaVision uses GPT-3 to
suggest new group descriptions for the user to explore. We demonstrate the
usefulness and generality of AdaVision in user studies, where users find major
bugs in state-of-the-art classification, object detection, and image captioning
models. These user-discovered groups have failure rates 2-3x higher than those
surfaced by automatic error clustering methods. Finally, finetuning on examples
found with AdaVision fixes the discovered bugs when evaluated on unseen
examples, without degrading in-distribution accuracy, and while also improving
performance on out-of-distribution datasets.
Related papers
- CableInspect-AD: An Expert-Annotated Anomaly Detection Dataset [14.246172794156987]
$textitCableInspect-AD$ is a high-quality dataset created and annotated by domain experts from Hydro-Qu'ebec, a Canadian public utility.
This dataset includes high-resolution images with challenging real-world anomalies, covering defects with varying severity levels.
We present a comprehensive evaluation protocol based on cross-validation to assess models' performances.
arXiv Detail & Related papers (2024-09-30T14:50:13Z) - VL4AD: Vision-Language Models Improve Pixel-wise Anomaly Detection [5.66050466694651]
We propose Vision-Language (VL) encoders into existing anomaly detectors to leverage the semantically broad VL pre-training for improved outlier awareness.
We also propose a new scoring function that enables data- and training-free outlier supervision via textual prompts.
The resulting VL4AD model achieves competitive performance on widely used benchmark datasets.
arXiv Detail & Related papers (2024-09-25T20:12:10Z) - Dual-Image Enhanced CLIP for Zero-Shot Anomaly Detection [58.228940066769596]
We introduce a Dual-Image Enhanced CLIP approach, leveraging a joint vision-language scoring system.
Our methods process pairs of images, utilizing each as a visual reference for the other, thereby enriching the inference process with visual context.
Our approach significantly exploits the potential of vision-language joint anomaly detection and demonstrates comparable performance with current SOTA methods across various datasets.
arXiv Detail & Related papers (2024-05-08T03:13:20Z) - Anomaly Detection by Adapting a pre-trained Vision Language Model [48.225404732089515]
We present a unified framework named CLIP-ADA for Anomaly Detection by Adapting a pre-trained CLIP model.
We introduce the learnable prompt and propose to associate it with abnormal patterns through self-supervised learning.
We achieve the state-of-the-art 97.5/55.6 and 89.3/33.1 on MVTec-AD and VisA for anomaly detection and localization.
arXiv Detail & Related papers (2024-03-14T15:35:07Z) - AttributionScanner: A Visual Analytics System for Model Validation with Metadata-Free Slice Finding [29.07617945233152]
Data slice finding is an emerging technique for validating machine learning (ML) models by identifying and analyzing subgroups in a dataset that exhibit poor performance.
This approach faces significant challenges, including the laborious and costly requirement for additional metadata.
We introduce AttributionScanner, an innovative human-in-the-loop Visual Analytics (VA) system, designed for metadata-free data slice finding.
Our system identifies interpretable data slices that involve common model behaviors and visualizes these patterns through an Attribution Mosaic design.
arXiv Detail & Related papers (2024-01-12T09:17:32Z) - Open-Vocabulary Video Anomaly Detection [57.552523669351636]
Video anomaly detection (VAD) with weak supervision has achieved remarkable performance in utilizing video-level labels to discriminate whether a video frame is normal or abnormal.
Recent studies attempt to tackle a more realistic setting, open-set VAD, which aims to detect unseen anomalies given seen anomalies and normal videos.
This paper takes a step further and explores open-vocabulary video anomaly detection (OVVAD), in which we aim to leverage pre-trained large models to detect and categorize seen and unseen anomalies.
arXiv Detail & Related papers (2023-11-13T02:54:17Z) - Discover, Explanation, Improvement: An Automatic Slice Detection
Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints.
This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks.
Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z) - Discovering Bugs in Vision Models using Off-the-shelf Image Generation
and Captioning [25.88974494276895]
This work demonstrates how off-the-shelf, large-scale, image-to-text and text-to-image models can be leveraged to automatically find failures.
In essence, a conditional text-to-image generative model is used to generate large amounts of synthetic, yet realistic, inputs.
arXiv Detail & Related papers (2022-08-18T13:49:10Z) - ACTIVE:Augmentation-Free Graph Contrastive Learning for Partial
Multi-View Clustering [52.491074276133325]
We propose an augmentation-free graph contrastive learning framework to solve the problem of partial multi-view clustering.
The proposed approach elevates instance-level contrastive learning and missing data inference to the cluster-level, effectively mitigating the impact of individual missing data on clustering.
arXiv Detail & Related papers (2022-03-01T02:32:25Z) - Causal Scene BERT: Improving object detection by searching for
challenging groups of data [125.40669814080047]
Computer vision applications rely on learning-based perception modules parameterized with neural networks for tasks like object detection.
These modules frequently have low expected error overall but high error on atypical groups of data due to biases inherent in the training process.
Our main contribution is a pseudo-automatic method to discover such groups in foresight by performing causal interventions on simulated scenes.
arXiv Detail & Related papers (2022-02-08T05:14:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.