Zero-shot Model Diagnosis
- URL: http://arxiv.org/abs/2303.15441v1
- Date: Mon, 27 Mar 2023 17:59:33 GMT
- Title: Zero-shot Model Diagnosis
- Authors: Jinqi Luo, Zhaoning Wang, Chen Henry Wu, Dong Huang, Fernando De la
Torre
- Abstract summary: A common approach to evaluate deep learning models is to build a labeled test set with attributes of interest and assess how well it performs.
This paper argues the case that Zero-shot Model Diagnosis (ZOOM) is possible without the need for a test set nor labeling.
- Score: 80.36063332820568
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When it comes to deploying deep vision models, the behavior of these systems
must be explicable to ensure confidence in their reliability and fairness. A
common approach to evaluate deep learning models is to build a labeled test set
with attributes of interest and assess how well it performs. However, creating
a balanced test set (i.e., one that is uniformly sampled over all the important
traits) is often time-consuming, expensive, and prone to mistakes. The question
we try to address is: can we evaluate the sensitivity of deep learning models
to arbitrary visual attributes without an annotated test set? This paper argues
the case that Zero-shot Model Diagnosis (ZOOM) is possible without the need for
a test set nor labeling. To avoid the need for test sets, our system relies on
a generative model and CLIP. The key idea is enabling the user to select a set
of prompts (relevant to the problem) and our system will automatically search
for semantic counterfactual images (i.e., synthesized images that flip the
prediction in the case of a binary classifier) using the generative model. We
evaluate several visual tasks (classification, key-point detection, and
segmentation) in multiple visual domains to demonstrate the viability of our
methodology. Extensive experiments demonstrate that our method is capable of
producing counterfactual images and offering sensitivity analysis for model
diagnosis without the need for a test set.
Related papers
- Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance.
Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z) - Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image Classification [2.1223532600703385]
This paper presents an innovative disjoint sampling approach for training SOTA models on Hyperspectral image classification (HSIC) tasks.
By separating training, validation, and test data without overlap, the proposed method facilitates a fairer evaluation of how well a model can classify pixels it was not exposed to during training or validation.
This rigorous methodology is critical for advancing SOTA models and their real-world application to large-scale land mapping with Hyperspectral sensors.
arXiv Detail & Related papers (2024-04-23T11:40:52Z) - Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection
Capability [70.72426887518517]
Out-of-distribution (OOD) detection is an indispensable aspect of secure AI when deploying machine learning models in real-world applications.
We propose a novel method, Unleashing Mask, which aims to restore the OOD discriminative capabilities of the well-trained model with ID data.
Our method utilizes a mask to figure out the memorized atypical samples, and then finetune the model or prune it with the introduced mask to forget them.
arXiv Detail & Related papers (2023-06-06T14:23:34Z) - Detection and Captioning with Unseen Object Classes [12.894104422808242]
Test images may contain visual objects with no corresponding visual or textual training examples.
We propose a detection-driven approach based on a generalized zero-shot detection model and a template-based sentence generation model.
Our experiments show that the proposed zero-shot detection model obtains state-of-the-art performance on the MS-COCO dataset.
arXiv Detail & Related papers (2021-08-13T10:43:20Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z) - CSI: Novelty Detection via Contrastive Learning on Distributionally
Shifted Instances [77.28192419848901]
We propose a simple, yet effective method named contrasting shifted instances (CSI)
In addition to contrasting a given sample with other instances as in conventional contrastive learning methods, our training scheme contrasts the sample with distributionally-shifted augmentations of itself.
Our experiments demonstrate the superiority of our method under various novelty detection scenarios.
arXiv Detail & Related papers (2020-07-16T08:32:56Z) - Evaluating Models' Local Decision Boundaries via Contrast Sets [119.38387782979474]
We propose a new annotation paradigm for NLP that helps to close systematic gaps in the test data.
We demonstrate the efficacy of contrast sets by creating them for 10 diverse NLP datasets.
Although our contrast sets are not explicitly adversarial, model performance is significantly lower on them than on the original test sets.
arXiv Detail & Related papers (2020-04-06T14:47:18Z) - Overinterpretation reveals image classification model pathologies [15.950659318117694]
convolutional neural networks (CNNs) on popular benchmarks exhibit troubling pathologies that allow them to display high accuracy even in the absence of semantically salient features.
We demonstrate that neural networks trained on CIFAR-10 and ImageNet suffer from overinterpretation.
Although these patterns portend potential model fragility in real-world deployment, they are in fact valid statistical patterns of the benchmark that alone suffice to attain high test accuracy.
arXiv Detail & Related papers (2020-03-19T17:12:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.