Vision Checklist: Towards Testable Error Analysis of Image Models to
Help System Designers Interrogate Model Capabilities
- URL: http://arxiv.org/abs/2201.11674v3
- Date: Mon, 31 Jan 2022 11:09:19 GMT
- Title: Vision Checklist: Towards Testable Error Analysis of Image Models to
Help System Designers Interrogate Model Capabilities
- Authors: Xin Du, Benedicte Legastelois, Bhargavi Ganesh, Ajitha Rajan, Hana
Chockler, Vaishak Belle, Stuart Anderson, Subramanian Ramamoorthy
- Abstract summary: Vision Checklist is a framework aimed at interrogating the capabilities of a model in order to produce a report that can be used by a system designer for robustness evaluations.
Our framework is evaluated on multiple datasets like Tinyimagenet, CIFAR10, CIFAR100 and Camelyon17 and for models like ViT and Resnet.
- Score: 26.177391265710362
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Using large pre-trained models for image recognition tasks is becoming
increasingly common owing to the well acknowledged success of recent models
like vision transformers and other CNN-based models like VGG and Resnet. The
high accuracy of these models on benchmark tasks has translated into their
practical use across many domains including safety-critical applications like
autonomous driving and medical diagnostics. Despite their widespread use, image
models have been shown to be fragile to changes in the operating environment,
bringing their robustness into question. There is an urgent need for methods
that systematically characterise and quantify the capabilities of these models
to help designers understand and provide guarantees about their safety and
robustness. In this paper, we propose Vision Checklist, a framework aimed at
interrogating the capabilities of a model in order to produce a report that can
be used by a system designer for robustness evaluations. This framework
proposes a set of perturbation operations that can be applied on the underlying
data to generate test samples of different types. The perturbations reflect
potential changes in operating environments, and interrogate various properties
ranging from the strictly quantitative to more qualitative. Our framework is
evaluated on multiple datasets like Tinyimagenet, CIFAR10, CIFAR100 and
Camelyon17 and for models like ViT and Resnet. Our Vision Checklist proposes a
specific set of evaluations that can be integrated into the previously proposed
concept of a model card. Robustness evaluations like our checklist will be
crucial in future safety evaluations of visual perception modules, and be
useful for a wide range of stakeholders including designers, deployers, and
regulators involved in the certification of these systems. Source code of
Vision Checklist would be open for public use.
Related papers
- On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [49.60774626839712]
multimodal generative models have sparked critical discussions on their fairness, reliability, and potential for misuse.
We propose an evaluation framework designed to assess model reliability through their responses to perturbations in the embedding space.
Our method lays the groundwork for detecting unreliable, bias-injected models and retrieval of bias provenance.
arXiv Detail & Related papers (2024-11-21T09:46:55Z) - Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance.
Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z) - Model Developmental Safety: A Safety-Centric Method and Applications in Vision-Language Models [75.8161094916476]
We study how to develop a pretrained vision-language model (aka the CLIP model) for acquiring new capabilities or improving existing capabilities of image classification.
Our experiments on improving vision perception capabilities on autonomous driving and scene recognition datasets demonstrate the efficacy of the proposed approach.
arXiv Detail & Related papers (2024-10-04T22:34:58Z) - Evaluation and Comparison of Visual Language Models for Transportation Engineering Problems [16.49637074299509]
We have explored state-of-the-art vision language models (VLM) for vision-based transportation engineering tasks.
The image classification task involves congestion detection and crack identification, whereas, for object detection, helmet violations were identified.
We have applied open-source models such as CLIP, BLIP, OWL-ViT, Llava-Next, and closed-source GPT-4o to evaluate the performance of these VLM models.
arXiv Detail & Related papers (2024-09-03T20:24:37Z) - BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation [57.40024206484446]
We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models.
BVS supports a large number of adjustable parameters at the scene level.
We showcase three example application scenarios.
arXiv Detail & Related papers (2024-05-15T17:57:56Z) - AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios.
We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z) - Zero-shot Model Diagnosis [80.36063332820568]
A common approach to evaluate deep learning models is to build a labeled test set with attributes of interest and assess how well it performs.
This paper argues the case that Zero-shot Model Diagnosis (ZOOM) is possible without the need for a test set nor labeling.
arXiv Detail & Related papers (2023-03-27T17:59:33Z) - ComplAI: Theory of A Unified Framework for Multi-factor Assessment of
Black-Box Supervised Machine Learning Models [6.279863832853343]
ComplAI is a unique framework to enable, observe, analyze and quantify explainability, robustness, performance, fairness, and model behavior.
It evaluates different supervised Machine Learning models not just from their ability to make correct predictions but from overall responsibility perspective.
arXiv Detail & Related papers (2022-12-30T08:48:19Z) - ViewFool: Evaluating the Robustness of Visual Recognition to Adversarial
Viewpoints [42.64942578228025]
We propose a novel method called ViewFool to find adversarial viewpoints that mislead visual recognition models.
By encoding real-world objects as neural radiance fields (NeRF), ViewFool characterizes a distribution of diverse adversarial viewpoints.
arXiv Detail & Related papers (2022-10-08T03:06:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.