Related papers: How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

URL: http://arxiv.org/abs/2311.16101v1
Date: Mon, 27 Nov 2023 18:59:42 GMT
Title: How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
Authors: Haoqin Tu, Chenhang Cui, Zijun Wang, Yiyang Zhou, Bingchen Zhao, Junlin Han, Wangchunshu Zhou, Huaxiu Yao, Cihang Xie
Abstract summary: This work focuses on the potential of Vision LLMs (VLLMs) in visual reasoning. We introduce a comprehensive safety evaluation suite, covering both out-of-distribution (OOD) generalization and adversarial robustness.
Score: 55.91371032213854
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work focuses on the potential of Vision LLMs (VLLMs) in visual reasoning. Different from prior studies, we shift our focus from evaluating standard performance to introducing a comprehensive safety evaluation suite, covering both out-of-distribution (OOD) generalization and adversarial robustness. For the OOD evaluation, we present two novel VQA datasets, each with one variant, designed to test model performance under challenging conditions. In exploring adversarial robustness, we propose a straightforward attack strategy for misleading VLLMs to produce visual-unrelated responses. Moreover, we assess the efficacy of two jailbreaking strategies, targeting either the vision or language component of VLLMs. Our evaluation of 21 diverse models, ranging from open-source VLLMs to GPT-4V, yields interesting observations: 1) Current VLLMs struggle with OOD texts but not images, unless the visual information is limited; and 2) These VLLMs can be easily misled by deceiving vision encoders only, and their vision-language training often compromise safety protocols. We release this safety evaluation suite at https://github.com/UCSC-VLAA/vllm-safety-benchmark.

Related papers

Hidden in plain sight: VLMs overlook their visual representations [48.83628674170634]
We compare vision language models (VLMs) to their visual encoders to understand their ability to integrate across these modalities.<n>We find that VLMs perform substantially worse than their visual encoders, dropping to near-chance performance.
arXiv Detail & Related papers (2025-06-09T17:59:54Z)
A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations [127.52707312573791]
This survey provides a comprehensive analysis of LVLM safety, covering key aspects such as attacks, defenses, and evaluation methods. We introduce a unified framework that integrates these interrelated components, offering a holistic perspective on the vulnerabilities of LVLMs. We conduct a set of safety evaluations on the latest LVLM, Deepseek Janus-Pro, and provide a theoretical analysis of the results.
arXiv Detail & Related papers (2025-02-14T08:42:43Z)
Retention Score: Quantifying Jailbreak Risks for Vision Language Models [60.48306899271866]
Vision-Language Models (VLMs) are integrated with Large Language Models (LLMs) to enhance multi-modal machine learning capabilities. This paper aims to assess the resilience of VLMs against jailbreak attacks that can compromise model safety compliance and result in harmful outputs. To evaluate a VLM's ability to maintain its robustness against adversarial input perturbations, we propose a novel metric called the textbfRetention Score.
arXiv Detail & Related papers (2024-12-23T13:05:51Z)
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation [95.78870389271832]
The standard practice for developing contemporary MLLMs is to feed features from vision encoder(s) into the LLM and train with natural language supervision. We propose OLA-VLM, the first approach distilling knowledge into the LLM's hidden representations from a set of target visual representations. We show that OLA-VLM boosts performance by an average margin of up to 2.5% on various benchmarks, with a notable improvement of 8.7% on the Depth task in CV-Bench.
arXiv Detail & Related papers (2024-12-12T18:55:18Z)
Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment [21.441662865727448]
We propose a progressive concept-based alignment strategy, PSA-VLM, to enhance visual modality safety alignment. Our method achieves state-of-the-art results on popular VLM safety benchmark.
arXiv Detail & Related papers (2024-11-18T13:01:57Z)
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? [55.14033256706175]
Large Vision-Language Models (LVLMs) have become essential for advancing the integration of visual and linguistic information. We introduce AutoBench-V, an automated framework for serving evaluation on demand. Through an extensive evaluation of seven popular LVLMs across five demanded user inputs, the framework shows effectiveness and reliability.
arXiv Detail & Related papers (2024-10-28T17:55:08Z)
VHELM: A Holistic Evaluation of Vision Language Models [75.88987277686914]
We present the Holistic Evaluation of Vision Language Models (VHELM) VHELM aggregates various datasets to cover one or more of the 9 aspects: visual perception, knowledge, reasoning, bias, fairness, multilinguality, robustness, toxicity, and safety. Our framework is designed to be lightweight and automatic so that evaluation runs are cheap and fast.
arXiv Detail & Related papers (2024-10-09T17:46:34Z)
Safety Alignment for Vision Language Models [21.441662865727448]
We enhance the visual modality safety alignment of Vision Language Models (VLMs) by adding safety modules. Our method boasts ease of use, high flexibility, and strong controllability, and it enhances safety while having minimal impact on the model's general performance.
arXiv Detail & Related papers (2024-05-22T12:21:27Z)
Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks [41.213482317141356]
Augmenting Large Language Models with image-understanding capabilities has resulted in a boom of high-performing Vision-Language models (VLMs) In this paper, we explore the impact of jailbreaking on three state-of-the-art VLMs, each using a distinct modeling approach.
arXiv Detail & Related papers (2024-05-07T15:29:48Z)
AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions [52.9787902653558]
Large Vision-Language Models (LVLMs) have shown significant progress in well responding to visual-instructions from users. Despite the critical importance of LVLMs' robustness against such threats, current research in this area remains limited. We introduce AVIBench, a framework designed to analyze the robustness of LVLMs when facing various adversarial visual-instructions.
arXiv Detail & Related papers (2024-03-14T12:51:07Z)
Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models [39.56233272612982]
Current vision large language models (VLLMs) exhibit remarkable capabilities yet are prone to generate harmful content and are vulnerable to jailbreaking attacks. Our initial analysis finds that this is due to the presence of harmful data during vision-language instruction fine-tuning. To address this issue, we first curate a vision-language safe instruction-following dataset VLGuard covering various harmful categories.
arXiv Detail & Related papers (2024-02-03T16:43:42Z)
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models [61.28463542324576]
Vision-language models (VLMs) have recently demonstrated strong efficacy as visual assistants that can generate human-like outputs. We evaluate existing state-of-the-art VLMs and find that even the best-performing model is unable to demonstrate strong visual reasoning capabilities and consistency. We propose a two-stage training framework aimed at improving both the reasoning performance and consistency of VLMs.
arXiv Detail & Related papers (2023-09-08T17:49:44Z)
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models [55.304181390027274]
This paper presents a comprehensive evaluation of publicly available large multimodal models by building a LVLM evaluation Hub (LVLM-eHub) Our LVLM-eHub consists of $8$ representative LVLMs such as InstructBLIP and MiniGPT-4, which are thoroughly evaluated by a quantitative capability evaluation and an online arena platform. The study reveals several innovative findings. First, instruction-tuned LVLM with massive in-domain data such as InstructBLIP heavily overfits many existing tasks, generalizing poorly in the open-world scenario.
arXiv Detail & Related papers (2023-06-15T16:39:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.