ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content
- URL: http://arxiv.org/abs/2503.09964v1
- Date: Thu, 13 Mar 2025 02:10:29 GMT
- Title: ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content
- Authors: Bhavik Chandna, Mariam Aboujenane, Usman Naseem,
- Abstract summary: We introduce ExtremeAIGC, a benchmark dataset and evaluation framework designed to assess model vulnerabilities against AI-generated extremist content.<n>ExtremeAIGC simulates real-world events and malicious use cases by curating diverse text- and image-based examples crafted using state-of-the-art image generation techniques.<n>Our study reveals alarming weaknesses in LMMs, demonstrating that even cutting-edge safety measures fail to prevent the generation of extremist material.
- Score: 3.6802208575951925
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Multimodal Models (LMMs) are increasingly vulnerable to AI-generated extremist content, including photorealistic images and text, which can be used to bypass safety mechanisms and generate harmful outputs. However, existing datasets for evaluating LMM robustness offer limited exploration of extremist content, often lacking AI-generated images, diverse image generation models, and comprehensive coverage of historical events, which hinders a complete assessment of model vulnerabilities. To fill this gap, we introduce ExtremeAIGC, a benchmark dataset and evaluation framework designed to assess LMM vulnerabilities against such content. ExtremeAIGC simulates real-world events and malicious use cases by curating diverse text- and image-based examples crafted using state-of-the-art image generation techniques. Our study reveals alarming weaknesses in LMMs, demonstrating that even cutting-edge safety measures fail to prevent the generation of extremist material. We systematically quantify the success rates of various attack strategies, exposing critical gaps in current defenses and emphasizing the need for more robust mitigation strategies.
Related papers
- A Framework for Evaluating Vision-Language Model Safety: Building Trust in AI for Public Sector Applications [0.0]
This paper introduces a novel framework to quantify adversarial risks in Vision-Language Models (VLMs)
We analyze model performance under Gaussian, salt-and-pepper, and uniform noise, identifying misclassification thresholds and deriving composite noise patches and saliency patterns that highlight vulnerable regions.
We propose a new Vulnerability Score that combines the impact of random noise and adversarial attacks, providing a comprehensive metric for evaluating model robustness.
arXiv Detail & Related papers (2025-02-22T21:33:26Z) - MAA: Meticulous Adversarial Attack against Vision-Language Pre-trained Models [30.04163729936878]
Meticulous Adrial Attack (MAA) fully exploit model-independent characteristics and vulnerabilities of individual samples.
MAA emphasizes fine-grained optimization of adversarial images by developing a novel resizing and sliding crop (RScrop) technique.
arXiv Detail & Related papers (2025-02-12T02:53:27Z) - MLLM-as-a-Judge for Image Safety without Human Labeling [81.24707039432292]
In the age of AI-generated content (AIGC), many image generation models are capable of producing harmful content.
It is crucial to identify such unsafe images based on established safety rules.
Existing approaches typically fine-tune MLLMs with human-labeled datasets.
arXiv Detail & Related papers (2024-12-31T00:06:04Z) - Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models [92.79804303337522]
Vision-Language Models (VLMs) may still be vulnerable to safety alignment issues.<n>We introduce MLAI, a novel jailbreak framework that leverages scenario-aware image generation for semantic alignment.<n>Extensive experiments demonstrate MLAI's significant impact, achieving attack success rates of 77.75% on MiniGPT-4 and 82.80% on LLaVA-2.
arXiv Detail & Related papers (2024-11-27T02:40:29Z) - A Grey-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse [7.390756739899558]
Recent advancements in generative AI, particularly Latent Diffusion Models (LDMs), have revolutionized image synthesis and manipulation.<n>We propose the Posterior Collapse Attack (PCA) based on the observation that VAEs suffer from posterior collapse during training.<n>Our method minimizes dependence on the white-box information of target models to get rid of the implicit reliance on model-specific knowledge.
arXiv Detail & Related papers (2024-08-20T14:43:53Z) - MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content [62.685566387625975]
Current mitigation strategies, while effective, are not resilient under adversarial attacks.
This paper introduces Resilient Guardrails for Large Language Models (RigorLLM), a novel framework designed to efficiently moderate harmful and unsafe inputs.
arXiv Detail & Related papers (2024-03-19T07:25:02Z) - On the Robustness of Large Multimodal Models Against Image Adversarial
Attacks [81.2935966933355]
We study the impact of visual adversarial attacks on Large Multimodal Models (LMMs)
We find that in general LMMs are not robust to visual adversarial inputs.
We propose a new approach to real-world image classification which we term query decomposition.
arXiv Detail & Related papers (2023-12-06T04:59:56Z) - Data Forensics in Diffusion Models: A Systematic Analysis of Membership
Privacy [62.16582309504159]
We develop a systematic analysis of membership inference attacks on diffusion models and propose novel attack methods tailored to each attack scenario.
Our approach exploits easily obtainable quantities and is highly effective, achieving near-perfect attack performance (>0.9 AUCROC) in realistic scenarios.
arXiv Detail & Related papers (2023-02-15T17:37:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.