Related papers: Identifying and Mitigating Model Failures through Few-shot CLIP-aided Diffusion Generation

Identifying and Mitigating Model Failures through Few-shot CLIP-aided Diffusion Generation

URL: http://arxiv.org/abs/2312.05464v1
Date: Sat, 9 Dec 2023 04:43:49 GMT
Title: Identifying and Mitigating Model Failures through Few-shot CLIP-aided Diffusion Generation
Authors: Atoosa Chegini, Soheil Feizi
Abstract summary: We propose an end-to-end framework to generate text descriptions of failure modes associated with spurious correlations. These descriptions can be used to generate synthetic data using generative models, such as diffusion models. Our experiments have shown remarkable textbfimprovements in accuracy ($sim textbf21%$) on hard sub-populations.
Score: 65.268245109828
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning models can encounter unexpected failures, especially when dealing with challenging sub-populations. One common reason for these failures is the occurrence of objects in backgrounds that are rarely seen during training. To gain a better understanding of these failure modes, human-interpretable descriptions are crucial for further analysis and improvement which is expensive. In this study, we propose an end-to-end framework that utilizes the capabilities of large language models (ChatGPT) and vision-language deep models (CLIP) to generate text descriptions of failure modes associated with spurious correlations (e.g. rarely seen backgrounds) without human-in-the-loop intervention. These descriptions can be used to generate synthetic data using generative models, such as diffusion models. The model can now use this generated data to learn from its weaknesses and enhance its performance on backgrounds that are uncommon for each class of data. Our approach serves as a broad solution, promising progress in comprehending model failure modes and strengthening deep learning models across a wide range of failure scenarios (e.g. bacckgrounds, colors) automatically in a few-shot manner. Our experiments have shown remarkable \textbf{improvements in accuracy ($\sim \textbf{21%}$)} on hard sub-populations (particularly for wrong background association) across $40$ different models, such as ResNets, EfficientNets, DenseNets, Vision Transformer (ViT), SwAVs, MoCos, DINOs, and CLIPs on various datasets such as ImageNet-1000, CIFAR-10, and CIFAR-100.

Related papers

SafeFix: Targeted Model Repair via Controlled Image Generation [5.4185493412773456]
We introduce a model repair module that builds on an interpretable failure attribution pipeline.<n>Our approach uses a conditional text-to-image model to generate semantically faithful and targeted images for failure cases.<n>By retraining vision models with this rare-case-augmented synthetic dataset, we significantly reduce errors associated with rare cases.
arXiv Detail & Related papers (2025-08-12T07:45:25Z)
xIDS-EnsembleGuard: An Explainable Ensemble Learning-based Intrusion Detection System [7.2738577621227085]
We focus on addressing the challenges of detecting malicious attacks in networks by designing an advanced Explainable Intrusion Detection System (xIDS) Existing machine learning and deep learning approaches have invisible limitations, such as potential biases in predictions, a lack of interpretability, and the risk of overfitting to training data. We propose an ensemble learning technique called "EnsembleGuard" to overcome these challenges.
arXiv Detail & Related papers (2025-03-01T20:49:31Z)
Boosting Alignment for Post-Unlearning Text-to-Image Generative Models [55.82190434534429]
Large-scale generative models have shown impressive image-generation capabilities, propelled by massive data. This often inadvertently leads to the generation of harmful or inappropriate content and raises copyright concerns. We propose a framework that seeks an optimal model update at each unlearning iteration, ensuring monotonic improvement on both objectives.
arXiv Detail & Related papers (2024-12-09T21:36:10Z)
Stealing the Invisible: Unveiling Pre-Trained CNN Models through Adversarial Examples and Timing Side-Channels [14.222432788661914]
We present an approach based on the observation that the classification patterns of adversarial images can be used as a means to steal the models. Our approach exploits varying misclassifications of adversarial images across different models to fingerprint several renowned Convolutional Neural Network (CNN) and Vision Transformer (ViT) architectures.
arXiv Detail & Related papers (2024-02-19T08:47:20Z)
Steganographic Capacity of Deep Learning Models [12.974139332068491]
We consider the steganographic capacity of several learning models. We train a Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), and Transformer model on a challenging malware classification problem. We find that the steganographic capacity of the learning models tested is surprisingly high, and that in each case, there is a clear threshold after which model performance rapidly degrades.
arXiv Detail & Related papers (2023-06-25T13:43:35Z)
Retrieval-Enhanced Contrastive Vision-Text Models [61.783728119255365]
We propose to equip vision-text models with the ability to refine their embedding with cross-modal retrieved information from a memory at inference time. Remarkably, we show that this can be done with a light-weight, single-layer, fusion transformer on top of a frozen CLIP. Our experiments validate that our retrieval-enhanced contrastive (RECO) training improves CLIP performance substantially on several challenging fine-grained tasks.
arXiv Detail & Related papers (2023-06-12T15:52:02Z)
The Curse of Recursion: Training on Generated Data Makes Models Forget [70.02793975243212]
Large language models (LLMs) are here to stay, and will bring about drastic change in the whole ecosystem of online text and images. We find that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear.
arXiv Detail & Related papers (2023-05-27T15:10:41Z)
LLM2Loss: Leveraging Language Models for Explainable Model Diagnostics [5.33024001730262]
We propose an approach that can provide semantic insights into a model's patterns of failures and biases. We show that an ensemble of such lightweight models can be used to generate insights on the performance of the black-box model.
arXiv Detail & Related papers (2023-05-04T23:54:37Z)
Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z)
An Empirical Study of Deep Learning Models for Vulnerability Detection [4.243592852049963]
We surveyed and reproduced 9 state-of-the-art deep learning models on 2 widely used vulnerability detection datasets. We investigated model capabilities, training data, and model interpretation. Our findings can help better understand model results, provide guidance on preparing training data, and improve the robustness of the models.
arXiv Detail & Related papers (2022-12-15T19:49:34Z)
Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning [25.88974494276895]
This work demonstrates how off-the-shelf, large-scale, image-to-text and text-to-image models can be leveraged to automatically find failures. In essence, a conditional text-to-image generative model is used to generate large amounts of synthetic, yet realistic, inputs.
arXiv Detail & Related papers (2022-08-18T13:49:10Z)
Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective. Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z)
Sufficiently Accurate Model Learning for Planning [119.80502738709937]
This paper introduces the constrained Sufficiently Accurate model learning approach. It provides examples of such problems, and presents a theorem on how close some approximate solutions can be. The approximate solution quality will depend on the function parameterization, loss and constraint function smoothness, and the number of samples in model learning.
arXiv Detail & Related papers (2021-02-11T16:27:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.