Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety
of Text-to-Image Models
- URL: http://arxiv.org/abs/2305.14384v1
- Date: Mon, 22 May 2023 15:02:40 GMT
- Title: Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety
of Text-to-Image Models
- Authors: Alicia Parrish, Hannah Rose Kirk, Jessica Quaye, Charvi Rastogi, Max
Bartolo, Oana Inel, Juan Ciro, Rafael Mosquera, Addison Howard, Will
Cukierski, D. Sculley, Vijay Janapa Reddi, Lora Aroyo
- Abstract summary: Adversarial Nibbler is a data-centric challenge, part of the DataPerf challenge suite, organized and supported by Kaggle and MLCommons.
- Score: 6.475537049815622
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The generative AI revolution in recent years has been spurred by an expansion
in compute power and data quantity, which together enable extensive
pre-training of powerful text-to-image (T2I) models. With their greater
capabilities to generate realistic and creative content, these T2I models like
DALL-E, MidJourney, Imagen or Stable Diffusion are reaching ever wider
audiences. Any unsafe behaviors inherited from pretraining on uncurated
internet-scraped datasets thus have the potential to cause wide-reaching harm,
for example, through generated images which are violent, sexually explicit, or
contain biased and derogatory stereotypes. Despite this risk of harm, we lack
systematic and structured evaluation datasets to scrutinize model behavior,
especially adversarial attacks that bypass existing safety filters. A typical
bottleneck in safety evaluation is achieving a wide coverage of different types
of challenging examples in the evaluation set, i.e., identifying 'unknown
unknowns' or long-tail problems. To address this need, we introduce the
Adversarial Nibbler challenge. The goal of this challenge is to crowdsource a
diverse set of failure modes and reward challenge participants for successfully
finding safety vulnerabilities in current state-of-the-art T2I models.
Ultimately, we aim to provide greater awareness of these issues and assist
developers in improving the future safety and reliability of generative AI
models. Adversarial Nibbler is a data-centric challenge, part of the DataPerf
challenge suite, organized and supported by Kaggle and MLCommons.
Related papers
- LoGU: Long-form Generation with Uncertainty Expressions [49.76417603761989]
We introduce the task of Long-form Generation with Uncertainty(LoGU)
We identify two key challenges: Uncertainty Suppression and Uncertainty Misalignment.
Our framework adopts a divide-and-conquer strategy, refining uncertainty based on atomic claims.
Experiments on three long-form instruction following datasets show that our method significantly improves accuracy, reduces hallucinations, and maintains the comprehensiveness of responses.
arXiv Detail & Related papers (2024-10-18T09:15:35Z) - Mind Your Questions! Towards Backdoor Attacks on Text-to-Visualization Models [21.2448592823259]
VisPoison is a framework designed to identify these vulnerabilities of text-to-vis models systematically.
We show that VisPoison achieves attack success rates of over 90%, highlighting the security problem of current text-to-vis models.
arXiv Detail & Related papers (2024-10-09T11:22:03Z) - A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance.
We propose a simple yet effective data augmentation approach by leveraging advancements in generative models.
Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z) - Direct Unlearning Optimization for Robust and Safe Text-to-Image Models [29.866192834825572]
Unlearning techniques have been developed to remove the model's ability to generate potentially harmful content.
These methods are easily bypassed by adversarial attacks, making them unreliable for ensuring the safety of generated images.
We propose Direct Unlearning Optimization (DUO), a novel framework for removing Not Safe For Work (NSFW) content from T2I models.
arXiv Detail & Related papers (2024-07-17T08:19:11Z) - Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation [19.06501699814924]
We build the Adversarial Nibbler Challenge, a red-teaming methodology for crowdsourcing implicitly adversarial prompts.
The challenge is run in consecutive rounds to enable a sustained discovery and analysis of safety pitfalls in T2I models.
We find that 14% of images that humans consider harmful are mislabeled as safe'' by machines.
arXiv Detail & Related papers (2024-02-14T22:21:12Z) - Harm Amplification in Text-to-Image Models [5.397559484007124]
Text-to-image (T2I) models have emerged as a significant advancement in generative AI.
There exist safety concerns regarding their potential to produce harmful image outputs even when users input seemingly safe prompts.
This phenomenon, where T2I models generate harmful representations that were not explicit in the input prompt, poses a potentially greater risk than adversarial prompts.
arXiv Detail & Related papers (2024-02-01T23:12:57Z) - Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity [80.16488817177182]
GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions.
We introduce three model stealing attacks to adapt to different actual scenarios.
arXiv Detail & Related papers (2023-12-18T05:42:31Z) - Identifying and Mitigating Model Failures through Few-shot CLIP-aided
Diffusion Generation [65.268245109828]
We propose an end-to-end framework to generate text descriptions of failure modes associated with spurious correlations.
These descriptions can be used to generate synthetic data using generative models, such as diffusion models.
Our experiments have shown remarkable textbfimprovements in accuracy ($sim textbf21%$) on hard sub-populations.
arXiv Detail & Related papers (2023-12-09T04:43:49Z) - Distilling Adversarial Prompts from Safety Benchmarks: Report for the
Adversarial Nibbler Challenge [32.140659176912735]
Text-conditioned image generation models have recently achieved astonishing image quality and alignment results.
Since they are highly data-driven, relying on billion-sized datasets randomly scraped from the web, they also produce unsafe content.
As a contribution to the Adversarial Nibbler challenge, we distill a large set of over 1,000 potential adversarial inputs from existing safety benchmarks.
Our analysis of the gathered prompts and corresponding images demonstrates the fragility of input filters and provides further insights into systematic safety issues in current generative image models.
arXiv Detail & Related papers (2023-09-20T18:25:44Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.