MSTS: A Multimodal Safety Test Suite for Vision-Language Models
- URL: http://arxiv.org/abs/2501.10057v1
- Date: Fri, 17 Jan 2025 09:22:35 GMT
- Title: MSTS: A Multimodal Safety Test Suite for Vision-Language Models
- Authors: Paul Röttger, Giuseppe Attanasio, Felix Friedrich, Janis Goldzycher, Alicia Parrish, Rishabh Bhardwaj, Chiara Di Bonaventura, Roman Eng, Gaia El Khoury Geagea, Sujata Goswami, Jieun Han, Dirk Hovy, Seogyeong Jeong, Paloma Jeretič, Flor Miriam Plaza-del-Arco, Donya Rooein, Patrick Schramowski, Anastassia Shaitarova, Xudong Shen, Richard Willats, Andrea Zugarini, Bertie Vidgen,
- Abstract summary: Vision-language models (VLMs) are increasingly integrated into chat assistants and other consumer AI applications.
Despite clear hazards, little work has evaluated VLM safety and the novel risks created by multimodal inputs.
M MSTS comprises 400 test prompts across 40 fine-grained hazard categories.
We translate MSTS into ten languages, showing non-English prompts to increase the rate of unsafe model responses.
- Score: 35.86658426268927
- License:
- Abstract: Vision-language models (VLMs), which process image and text inputs, are increasingly integrated into chat assistants and other consumer AI applications. Without proper safeguards, however, VLMs may give harmful advice (e.g. how to self-harm) or encourage unsafe behaviours (e.g. to consume drugs). Despite these clear hazards, little work so far has evaluated VLM safety and the novel risks created by multimodal inputs. To address this gap, we introduce MSTS, a Multimodal Safety Test Suite for VLMs. MSTS comprises 400 test prompts across 40 fine-grained hazard categories. Each test prompt consists of a text and an image that only in combination reveal their full unsafe meaning. With MSTS, we find clear safety issues in several open VLMs. We also find some VLMs to be safe by accident, meaning that they are safe because they fail to understand even simple test prompts. We translate MSTS into ten languages, showing non-English prompts to increase the rate of unsafe model responses. We also show models to be safer when tested with text only rather than multimodal prompts. Finally, we explore the automation of VLM safety assessments, finding even the best safety classifiers to be lacking.
Related papers
- Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs [56.440345471966666]
Multimodal Large Language Models (MLLMs) have expanded the capabilities of traditional language models by enabling interaction through both text and images.
This paper introduces MMSafeAware, the first comprehensive multimodal safety awareness benchmark designed to evaluate MLLMs across 29 safety scenarios.
MMSafeAware includes both unsafe and over-safety subsets to assess models abilities to correctly identify unsafe content and avoid over-sensitivity that can hinder helpfulness.
arXiv Detail & Related papers (2025-02-16T16:12:40Z) - ASTRAL: Automated Safety Testing of Large Language Models [6.1050306667733185]
Large Language Models (LLMs) have recently gained attention due to their ability to understand and generate sophisticated human-like content.
We present ASTRAL, a tool that automates the generation and execution of test cases (i.e., prompts) for testing the safety of LLMs.
arXiv Detail & Related papers (2025-01-28T18:25:11Z) - LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps [63.10843814055688]
M-ALERT is a benchmark that evaluates the safety of Large Language Models in five languages: English, French, German, Italian, and Spanish.
M-ALERT includes 15k high-quality prompts per language, totaling 75k, following the detailed ALERT taxonomy.
arXiv Detail & Related papers (2024-12-19T16:46:54Z) - VLSBench: Unveiling Visual Leakage in Multimodal Safety [39.344623032631475]
Safety concerns of Multimodal large language models (MLLMs) have gradually become an important problem in various applications.
Previous works indicate a counter-intuitive phenomenon that using textual unlearning to align MLLMs achieves comparable safety performances with MLLMs trained with image-text pairs.
This study demonstrates that textual alignment is enough for multimodal safety scenarios with VSIL, while multimodal alignment is a more promising solution for multimodal safety scenarios without VSIL.
arXiv Detail & Related papers (2024-11-29T18:56:37Z) - Multimodal Situational Safety [73.63981779844916]
We present the first evaluation and analysis of a novel safety challenge termed Multimodal Situational Safety.
For an MLLM to respond safely, whether through language or action, it often needs to assess the safety implications of a language query within its corresponding visual context.
We develop the Multimodal Situational Safety benchmark (MSSBench) to assess the situational safety performance of current MLLMs.
arXiv Detail & Related papers (2024-10-08T16:16:07Z) - ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [90.73444232283371]
ShieldLM is a safety detector for Large Language Models (LLMs) that aligns with common safety standards.
We show that ShieldLM surpasses strong baselines across four test sets, showcasing remarkable customizability and explainability.
arXiv Detail & Related papers (2024-02-26T09:43:02Z) - Safety of Multimodal Large Language Models on Images and Texts [33.97489213223888]
In this paper, we systematically survey current efforts on the evaluation, attack, and defense of MLLMs' safety on images and text.
We review the evaluation datasets and metrics for measuring the safety of MLLMs.
Next, we comprehensively present attack and defense techniques related to MLLMs' safety.
arXiv Detail & Related papers (2024-02-01T05:57:10Z) - SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in
Large Language Models [15.896567445646784]
We introduce SimpleSafetyTests (SST) as a new test suite for rapidly and systematically identifying such critical safety risks.
The test suite comprises 100 test prompts across five harm areas that LLMs, for the vast majority of applications, should refuse to comply with.
While some of the models do not give a single unsafe response, most give unsafe responses to more than 20% of the prompts, with over 50% unsafe responses in the extreme.
arXiv Detail & Related papers (2023-11-14T18:33:43Z) - Safety Assessment of Chinese Large Language Models [51.83369778259149]
Large language models (LLMs) may generate insulting and discriminatory content, reflect incorrect social values, and may be used for malicious purposes.
To promote the deployment of safe, responsible, and ethical AI, we release SafetyPrompts including 100k augmented prompts and responses by LLMs.
arXiv Detail & Related papers (2023-04-20T16:27:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.