Benchmarking Robustness to Text-Guided Corruptions
- URL: http://arxiv.org/abs/2304.02963v2
- Date: Mon, 31 Jul 2023 09:08:49 GMT
- Title: Benchmarking Robustness to Text-Guided Corruptions
- Authors: Mohammadreza Mofayezi and Yasamin Medghalchi
- Abstract summary: We use diffusion models to edit images to different domains.
We define a prompt hierarchy based on the original ImageNet hierarchy to apply edits in different domains.
We observe that convolutional models are more robust than transformer architectures.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study investigates the robustness of image classifiers to text-guided
corruptions. We utilize diffusion models to edit images to different domains.
Unlike other works that use synthetic or hand-picked data for benchmarking, we
use diffusion models as they are generative models capable of learning to edit
images while preserving their semantic content. Thus, the corruptions will be
more realistic and the comparison will be more informative. Also, there is no
need for manual labeling and we can create large-scale benchmarks with less
effort. We define a prompt hierarchy based on the original ImageNet hierarchy
to apply edits in different domains. As well as introducing a new benchmark we
try to investigate the robustness of different vision models. The results of
this study demonstrate that the performance of image classifiers decreases
significantly in different language-based corruptions and edit domains. We also
observe that convolutional models are more robust than transformer
architectures. Additionally, we see that common data augmentation techniques
can improve the performance on both the original data and the edited images.
The findings of this research can help improve the design of image classifiers
and contribute to the development of more robust machine learning systems. The
code for generating the benchmark is available at
https://github.com/ckoorosh/RobuText.
Related papers
- Zero-Shot Detection of AI-Generated Images [54.01282123570917]
We propose a zero-shot entropy-based detector (ZED) to detect AI-generated images.
Inspired by recent works on machine-generated text detection, our idea is to measure how surprising the image under analysis is compared to a model of real images.
ZED achieves an average improvement of more than 3% over the SoTA in terms of accuracy.
arXiv Detail & Related papers (2024-09-24T08:46:13Z) - Benchmarking Counterfactual Image Generation [22.573830532174956]
Generative AI has revolutionised visual content editing, empowering users to effortlessly modify images and videos.
To perform realistic edits in domains such as natural image or medical imaging, modifications must respect causal relationships.
We present a comparison framework to thoroughly benchmark counterfactual image generation methods.
arXiv Detail & Related papers (2024-03-29T16:58:13Z) - ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object [78.58860252442045]
We introduce generative model as a data source for hard images that benchmark deep models' robustness.
We are able to generate images with more diversified backgrounds, textures, and materials than any prior work, where we term this benchmark as ImageNet-D.
Our work suggests that diffusion models can be an effective source to test vision models.
arXiv Detail & Related papers (2024-03-27T17:23:39Z) - Diversified in-domain synthesis with efficient fine-tuning for few-shot
classification [64.86872227580866]
Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class.
We propose DISEF, a novel approach which addresses the generalization challenge in few-shot learning using synthetic data.
We validate our method in ten different benchmarks, consistently outperforming baselines and establishing a new state-of-the-art for few-shot classification.
arXiv Detail & Related papers (2023-12-05T17:18:09Z) - RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching Models [36.19590638188108]
We create new variants of texts and images in the MS-COCO test set and re-evaluate the state-of-the-art (SOTA) models with the new data.
Specifically, we alter the meaning of text by replacing a word, and generate visually altered images that maintain some visual context.
Our evaluations on the proposed benchmark reveal substantial performance degradation in many SOTA models.
arXiv Detail & Related papers (2023-04-21T03:45:59Z) - Discriminative Class Tokens for Text-to-Image Diffusion Models [107.98436819341592]
We propose a non-invasive fine-tuning technique that capitalizes on the expressive potential of free-form text.
Our method is fast compared to prior fine-tuning methods and does not require a collection of in-class images.
We evaluate our method extensively, showing that the generated images are: (i) more accurate and of higher quality than standard diffusion models, (ii) can be used to augment training data in a low-resource setting, and (iii) reveal information about the data used to train the guiding classifier.
arXiv Detail & Related papers (2023-03-30T05:25:20Z) - ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing [45.14977000707886]
Higher accuracy on ImageNet usually leads to better robustness against different corruptions.
We create a toolkit for object editing with controls of backgrounds, sizes, positions, and directions.
We evaluate the performance of current deep learning models, including both convolutional neural networks and vision transformers.
arXiv Detail & Related papers (2023-03-30T02:02:32Z) - Re-Imagen: Retrieval-Augmented Text-to-Image Generator [58.60472701831404]
Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
arXiv Detail & Related papers (2022-09-29T00:57:28Z) - GIT: A Generative Image-to-text Transformer for Vision and Language [138.91581326369837]
We train a Generative Image-to-text Transformer, GIT, to unify vision-language tasks such as image/video captioning and question answering.
Our model surpasses the human performance for the first time on TextCaps (138.2 vs. 125.5 in CIDEr)
arXiv Detail & Related papers (2022-05-27T17:03:38Z) - RTIC: Residual Learning for Text and Image Composition using Graph
Convolutional Network [19.017377597937617]
We study the compositional learning of images and texts for image retrieval.
We introduce a novel method that combines the graph convolutional network (GCN) with existing composition methods.
arXiv Detail & Related papers (2021-04-07T09:41:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.