A Framework for Evaluating Vision-Language Model Safety: Building Trust in AI for Public Sector Applications
- URL: http://arxiv.org/abs/2502.16361v1
- Date: Sat, 22 Feb 2025 21:33:26 GMT
- Title: A Framework for Evaluating Vision-Language Model Safety: Building Trust in AI for Public Sector Applications
- Authors: Maisha Binte Rashid, Pablo Rivas,
- Abstract summary: This paper introduces a novel framework to quantify adversarial risks in Vision-Language Models (VLMs)<n>We analyze model performance under Gaussian, salt-and-pepper, and uniform noise, identifying misclassification thresholds and deriving composite noise patches and saliency patterns that highlight vulnerable regions.<n>We propose a new Vulnerability Score that combines the impact of random noise and adversarial attacks, providing a comprehensive metric for evaluating model robustness.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-Language Models (VLMs) are increasingly deployed in public sector missions, necessitating robust evaluation of their safety and vulnerability to adversarial attacks. This paper introduces a novel framework to quantify adversarial risks in VLMs. We analyze model performance under Gaussian, salt-and-pepper, and uniform noise, identifying misclassification thresholds and deriving composite noise patches and saliency patterns that highlight vulnerable regions. These patterns are compared against the Fast Gradient Sign Method (FGSM) to assess their adversarial effectiveness. We propose a new Vulnerability Score that combines the impact of random noise and adversarial attacks, providing a comprehensive metric for evaluating model robustness.
Related papers
- Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge [0.0]
Large Language Models (LLMs) have revolutionized artificial intelligence, driving advancements in machine translation, summarization, and conversational agents.
Recent studies indicate that LLMs remain vulnerable to adversarial attacks designed to elicit biased responses.
This work proposes a scalable benchmarking framework to evaluate LLM robustness against adversarial bias elicitation.
arXiv Detail & Related papers (2025-04-10T16:00:59Z) - Adversarial Training for Defense Against Label Poisoning Attacks [53.893792844055106]
Label poisoning attacks pose significant risks to machine learning models.
We propose a novel adversarial training defense strategy based on support vector machines (SVMs) to counter these threats.
Our approach accommodates various model architectures and employs a projected gradient descent algorithm with kernel SVMs for adversarial training.
arXiv Detail & Related papers (2025-02-24T13:03:19Z) - Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks [34.40254709148148]
Pre-trained vision-language models (VLMs) have showcased remarkable performance in image and natural language understanding.
Their potential safety and robustness issues raise concerns that adversaries may evade the system and cause these models to generate toxic content through malicious attacks.
We present Chain of Attack (CoA), which iteratively enhances the generation of adversarial examples based on the multi-modal semantic update.
arXiv Detail & Related papers (2024-11-24T05:28:07Z) - MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - A Novel Evaluation Framework for Assessing Resilience Against Prompt Injection Attacks in Large Language Models [0.0]
This study introduces a novel framework for quantifying the resilience of applications.
The framework incorporates innovative techniques designed to ensure representativeness, interpretability, and robustness.
Results revealed that Llama2, the newer model exhibited higher resilience compared to ChatGLM.
arXiv Detail & Related papers (2024-01-02T02:06:48Z) - JAB: Joint Adversarial Prompting and Belief Augmentation [81.39548637776365]
We introduce a joint framework in which we probe and improve the robustness of a black-box target model via adversarial prompting and belief augmentation.
This framework utilizes an automated red teaming approach to probe the target model, along with a belief augmenter to generate instructions for the target model to improve its robustness to those adversarial probes.
arXiv Detail & Related papers (2023-11-16T00:35:54Z) - ASSERT: Automated Safety Scenario Red Teaming for Evaluating the
Robustness of Large Language Models [65.79770974145983]
ASSERT, Automated Safety Scenario Red Teaming, consists of three methods -- semantically aligned augmentation, target bootstrapping, and adversarial knowledge injection.
We partition our prompts into four safety domains for a fine-grained analysis of how the domain affects model performance.
We find statistically significant performance differences of up to 11% in absolute classification accuracy among semantically related scenarios and error rates of up to 19% absolute error in zero-shot adversarial settings.
arXiv Detail & Related papers (2023-10-14T17:10:28Z) - From Adversarial Arms Race to Model-centric Evaluation: Motivating a
Unified Automatic Robustness Evaluation Framework [91.94389491920309]
Textual adversarial attacks can discover models' weaknesses by adding semantic-preserved but misleading perturbations to the inputs.
The existing practice of robustness evaluation may exhibit issues of incomprehensive evaluation, impractical evaluation protocol, and invalid adversarial samples.
We set up a unified automatic robustness evaluation framework, shifting towards model-centric evaluation to exploit the advantages of adversarial attacks.
arXiv Detail & Related papers (2023-05-29T14:55:20Z) - A Comprehensive Evaluation Framework for Deep Model Robustness [44.20580847861682]
Deep neural networks (DNNs) have achieved remarkable performance across a wide area of applications.
They are vulnerable to adversarial examples, which motivates the adversarial defense.
This paper presents a model evaluation framework containing a comprehensive, rigorous, and coherent set of evaluation metrics.
arXiv Detail & Related papers (2021-01-24T01:04:25Z) - SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics.
We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations.
We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.