Identifying and Mitigating Model Failures through Few-shot CLIP-aided
Diffusion Generation
- URL: http://arxiv.org/abs/2312.05464v1
- Date: Sat, 9 Dec 2023 04:43:49 GMT
- Title: Identifying and Mitigating Model Failures through Few-shot CLIP-aided
Diffusion Generation
- Authors: Atoosa Chegini, Soheil Feizi
- Abstract summary: We propose an end-to-end framework to generate text descriptions of failure modes associated with spurious correlations.
These descriptions can be used to generate synthetic data using generative models, such as diffusion models.
Our experiments have shown remarkable textbfimprovements in accuracy ($sim textbf21%$) on hard sub-populations.
- Score: 65.268245109828
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning models can encounter unexpected failures, especially when
dealing with challenging sub-populations. One common reason for these failures
is the occurrence of objects in backgrounds that are rarely seen during
training. To gain a better understanding of these failure modes,
human-interpretable descriptions are crucial for further analysis and
improvement which is expensive. In this study, we propose an end-to-end
framework that utilizes the capabilities of large language models (ChatGPT) and
vision-language deep models (CLIP) to generate text descriptions of failure
modes associated with spurious correlations (e.g. rarely seen backgrounds)
without human-in-the-loop intervention. These descriptions can be used to
generate synthetic data using generative models, such as diffusion models. The
model can now use this generated data to learn from its weaknesses and enhance
its performance on backgrounds that are uncommon for each class of data. Our
approach serves as a broad solution, promising progress in comprehending model
failure modes and strengthening deep learning models across a wide range of
failure scenarios (e.g. bacckgrounds, colors) automatically in a few-shot
manner. Our experiments have shown remarkable \textbf{improvements in accuracy
($\sim \textbf{21%}$)} on hard sub-populations (particularly for wrong
background association) across $40$ different models, such as ResNets,
EfficientNets, DenseNets, Vision Transformer (ViT), SwAVs, MoCos, DINOs, and
CLIPs on various datasets such as ImageNet-1000, CIFAR-10, and CIFAR-100.
Related papers
- Stealing the Invisible: Unveiling Pre-Trained CNN Models through
Adversarial Examples and Timing Side-Channels [14.222432788661914]
We present an approach based on the observation that the classification patterns of adversarial images can be used as a means to steal the models.
Our approach exploits varying misclassifications of adversarial images across different models to fingerprint several renowned Convolutional Neural Network (CNN) and Vision Transformer (ViT) architectures.
arXiv Detail & Related papers (2024-02-19T08:47:20Z) - Steganographic Capacity of Deep Learning Models [12.974139332068491]
We consider the steganographic capacity of several learning models.
We train a Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), and Transformer model on a challenging malware classification problem.
We find that the steganographic capacity of the learning models tested is surprisingly high, and that in each case, there is a clear threshold after which model performance rapidly degrades.
arXiv Detail & Related papers (2023-06-25T13:43:35Z) - Retrieval-Enhanced Contrastive Vision-Text Models [61.783728119255365]
We propose to equip vision-text models with the ability to refine their embedding with cross-modal retrieved information from a memory at inference time.
Remarkably, we show that this can be done with a light-weight, single-layer, fusion transformer on top of a frozen CLIP.
Our experiments validate that our retrieval-enhanced contrastive (RECO) training improves CLIP performance substantially on several challenging fine-grained tasks.
arXiv Detail & Related papers (2023-06-12T15:52:02Z) - The Curse of Recursion: Training on Generated Data Makes Models Forget [70.02793975243212]
Large language models (LLMs) are here to stay, and will bring about drastic change in the whole ecosystem of online text and images.
We find that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear.
arXiv Detail & Related papers (2023-05-27T15:10:41Z) - LLM2Loss: Leveraging Language Models for Explainable Model Diagnostics [5.33024001730262]
We propose an approach that can provide semantic insights into a model's patterns of failures and biases.
We show that an ensemble of such lightweight models can be used to generate insights on the performance of the black-box model.
arXiv Detail & Related papers (2023-05-04T23:54:37Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - An Empirical Study of Deep Learning Models for Vulnerability Detection [4.243592852049963]
We surveyed and reproduced 9 state-of-the-art deep learning models on 2 widely used vulnerability detection datasets.
We investigated model capabilities, training data, and model interpretation.
Our findings can help better understand model results, provide guidance on preparing training data, and improve the robustness of the models.
arXiv Detail & Related papers (2022-12-15T19:49:34Z) - Discovering Bugs in Vision Models using Off-the-shelf Image Generation
and Captioning [25.88974494276895]
This work demonstrates how off-the-shelf, large-scale, image-to-text and text-to-image models can be leveraged to automatically find failures.
In essence, a conditional text-to-image generative model is used to generate large amounts of synthetic, yet realistic, inputs.
arXiv Detail & Related papers (2022-08-18T13:49:10Z) - Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective.
Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination.
Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z) - Sufficiently Accurate Model Learning for Planning [119.80502738709937]
This paper introduces the constrained Sufficiently Accurate model learning approach.
It provides examples of such problems, and presents a theorem on how close some approximate solutions can be.
The approximate solution quality will depend on the function parameterization, loss and constraint function smoothness, and the number of samples in model learning.
arXiv Detail & Related papers (2021-02-11T16:27:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.