SoK: The Security-Safety Continuum of Multimodal Foundation Models through Information Flow and Game-Theoretic Defenses
- URL: http://arxiv.org/abs/2411.11195v4
- Date: Tue, 12 Aug 2025 23:56:35 GMT
- Title: SoK: The Security-Safety Continuum of Multimodal Foundation Models through Information Flow and Game-Theoretic Defenses
- Authors: Ruoxi Sun, Jiamin Chang, Hammond Pearce, Chaowei Xiao, Bo Li, Qi Wu, Surya Nepal, Minhui Xue,
- Abstract summary: Multimodal foundation models (MFMs) integrate diverse data modalities to support complex and wide-ranging tasks.<n>In this paper, we unify the concepts of safety and security in the context of MFMs by identifying critical threats that arise from both model behavior and system-level interactions.
- Score: 58.93030774141753
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal foundation models (MFMs) integrate diverse data modalities to support complex and wide-ranging tasks. However, this integration also introduces distinct safety and security challenges. In this paper, we unify the concepts of safety and security in the context of MFMs by identifying critical threats that arise from both model behavior and system-level interactions. We propose a taxonomy grounded in information theory, evaluating risks through the concepts of channel capacity, signal, noise, and bandwidth. This perspective provides a principled way to analyze how information flows through MFMs and how vulnerabilities can emerge across modalities. Building on this foundation, we investigate defense mechanisms through the lens of a minimax game between attackers and defenders, highlighting key gaps in current research. In particular, we identify insufficient protection for cross-modal alignment and a lack of systematic and scalable defense strategies. Our work offers both a theoretical and practical foundation for advancing the safety and security of MFMs, supporting the development of more robust and trustworthy systems.
Related papers
- Responsible Diffusion: A Comprehensive Survey on Safety, Ethics, and Trust in Diffusion Models [69.22690439422531]
Diffusion models (DMs) have been investigated in various domains due to their ability to generate high-quality data.<n>Similar to traditional deep learning systems, there also exist potential threats to DMs.<n>This survey comprehensively elucidates its framework, threats, and countermeasures.
arXiv Detail & Related papers (2025-09-25T02:51:43Z) - Unveiling Trust in Multimodal Large Language Models: Evaluation, Analysis, and Mitigation [51.19622266249408]
MultiTrust-X is a benchmark for evaluating, analyzing, and mitigating the trustworthiness issues of MLLMs.<n>Based on the taxonomy, MultiTrust-X includes 32 tasks and 28 curated datasets.<n>Our experiments reveal significant vulnerabilities in current models.
arXiv Detail & Related papers (2025-08-21T09:00:01Z) - On the Security and Privacy of Federated Learning: A Survey with Attacks, Defenses, Frameworks, Applications, and Future Directions [1.7056096558557128]
Federated Learning (FL) is an emerging distributed machine learning paradigm enabling clients to train a global model collaboratively without sharing their raw data.<n>While FL enhances data privacy by design, it remains vulnerable to various security and privacy threats.<n>Security-enhancing methods aim to improve FL robustness against malicious behaviors such as byzantine attacks, poisoning, and Sybil attacks.<n>Privacy-preserving techniques focus on protecting sensitive data through cryptographic approaches, differential privacy, and secure aggregation.
arXiv Detail & Related papers (2025-08-19T11:06:20Z) - Zero-Trust Foundation Models: A New Paradigm for Secure and Collaborative Artificial Intelligence for Internet of Things [61.43014629640404]
Zero-Trust Foundation Models (ZTFMs) embed zero-trust security principles into the lifecycle of foundation models (FMs) for Internet of Things (IoT) systems.<n>ZTFMs can enable secure, privacy-preserving AI across distributed, heterogeneous, and potentially adversarial IoT environments.
arXiv Detail & Related papers (2025-05-26T06:44:31Z) - Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model [29.63418384788804]
We conduct a safety evaluation of 11 Multimodal Large Reasoning Models (MLRMs) across 5 benchmarks.<n>Our analysis reveals distinct safety patterns across different benchmarks.<n>It is a potential approach to address safety issues in MLRMs by leveraging the intrinsic reasoning capabilities of the model to detect unsafe intent.
arXiv Detail & Related papers (2025-05-10T06:59:36Z) - Safety in Large Reasoning Models: A Survey [15.148492389864133]
Large Reasoning Models (LRMs) have exhibited extraordinary prowess in tasks like mathematics and coding, leveraging their advanced reasoning capabilities.
This paper presents a comprehensive survey of LRMs, meticulously exploring and summarizing the newly emerged safety risks, attacks, and defense strategies.
arXiv Detail & Related papers (2025-04-24T16:11:01Z) - Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions [7.986500985812646]
The Model Context Protocol (MCP) is an emerging open standard that defines a unified, bi-directional communication and dynamic discovery protocol between AI models and external tools or resources.<n>This paper presents a systematic study of MCP from both architectural and security perspectives.
arXiv Detail & Related papers (2025-03-30T01:58:22Z) - MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models [101.70140132374307]
Multimodal foundation models (MMFMs) play a crucial role in various applications, including autonomous driving, healthcare, and virtual assistants.
Existing benchmarks on multimodal models either predominantly assess the helpfulness of these models, or only focus on limited perspectives such as fairness and privacy.
We present the first unified platform, MMDT (Multimodal DecodingTrust), designed to provide a comprehensive safety and trustworthiness evaluation for MMFMs.
arXiv Detail & Related papers (2025-03-19T01:59:44Z) - A Survey of Model Extraction Attacks and Defenses in Distributed Computing Environments [55.60375624503877]
Model Extraction Attacks (MEAs) threaten modern machine learning systems by enabling adversaries to steal models, exposing intellectual property and training data.
This survey is motivated by the urgent need to understand how the unique characteristics of cloud, edge, and federated deployments shape attack vectors and defense requirements.
We systematically examine the evolution of attack methodologies and defense mechanisms across these environments, demonstrating how environmental factors influence security strategies in critical sectors such as autonomous vehicles, healthcare, and financial services.
arXiv Detail & Related papers (2025-02-22T03:46:50Z) - Model Privacy: A Unified Framework to Understand Model Stealing Attacks and Defenses [11.939472526374246]
This work presents a framework called Model Privacy'', providing a foundation for comprehensively analyzing model stealing attacks and defenses.
We propose methods to quantify the goodness of attack and defense strategies, and analyze the fundamental tradeoffs between utility and privacy in ML models.
arXiv Detail & Related papers (2025-02-21T16:29:11Z) - Ten Challenging Problems in Federated Foundation Models [55.343738234307544]
Federated Foundation Models (FedFMs) represent a distributed learning paradigm that fuses general competences of foundation models as well as privacy-preserving capabilities of federated learning.
This paper provides a comprehensive summary of the ten challenging problems inherent in FedFMs, encompassing foundational theory, utilization of private data, continual learning, unlearning, Non-IID and graph data, bidirectional knowledge transfer, incentive mechanism design, game mechanism design, model watermarking, and efficiency.
arXiv Detail & Related papers (2025-02-14T04:01:15Z) - Safety at Scale: A Comprehensive Survey of Large Model Safety [298.05093528230753]
We present a comprehensive taxonomy of safety threats to large models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats.
We identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices.
arXiv Detail & Related papers (2025-02-02T05:14:22Z) - New Emerged Security and Privacy of Pre-trained Model: a Survey and Outlook [54.24701201956833]
Security and privacy issues have undermined users' confidence in pre-trained models.
Current literature lacks a clear taxonomy of emerging attacks and defenses for pre-trained models.
This taxonomy categorizes attacks and defenses into No-Change, Input-Change, and Model-Change approaches.
arXiv Detail & Related papers (2024-11-12T10:15:33Z) - A Formal Framework for Assessing and Mitigating Emergent Security Risks in Generative AI Models: Bridging Theory and Dynamic Risk Mitigation [0.3413711585591077]
As generative AI systems, including large language models (LLMs) and diffusion models, advance rapidly, their growing adoption has led to new and complex security risks.
This paper introduces a novel formal framework for categorizing and mitigating these emergent security risks.
We identify previously under-explored risks, including latent space exploitation, multi-modal cross-attack vectors, and feedback-loop-induced model degradation.
arXiv Detail & Related papers (2024-10-15T02:51:32Z) - Multimodal Situational Safety [73.63981779844916]
We present the first evaluation and analysis of a novel safety challenge termed Multimodal Situational Safety.<n>For an MLLM to respond safely, whether through language or action, it often needs to assess the safety implications of a language query within its corresponding visual context.<n>We develop the Multimodal Situational Safety benchmark (MSSBench) to assess the situational safety performance of current MLLMs.
arXiv Detail & Related papers (2024-10-08T16:16:07Z) - Diffusion Models for Offline Multi-agent Reinforcement Learning with Safety Constraints [0.0]
We introduce an innovative framework integrating diffusion models within the Multi-agent Reinforcement Learning paradigm.
This approach notably enhances the safety of actions taken by multiple agents through risk mitigation while modeling coordinated action.
arXiv Detail & Related papers (2024-06-30T16:05:31Z) - Safe Inputs but Unsafe Output: Benchmarking Cross-modality Safety Alignment of Large Vision-Language Model [73.8765529028288]
We introduce a novel safety alignment challenge called Safe Inputs but Unsafe Output (SIUO) to evaluate cross-modality safety alignment.<n>To empirically investigate this problem, we developed the SIUO, a cross-modality benchmark encompassing 9 critical safety domains, such as self-harm, illegal activities, and privacy violations.<n>Our findings reveal substantial safety vulnerabilities in both closed- and open-source LVLMs, underscoring the inadequacy of current models to reliably interpret and respond to complex, real-world scenarios.
arXiv Detail & Related papers (2024-06-21T16:14:15Z) - Securing Federated Learning with Control-Flow Attestation: A Novel Framework for Enhanced Integrity and Resilience against Adversarial Attacks [2.28438857884398]
Federated Learning (FL) as a distributed machine learning paradigm has introduced new cybersecurity challenges.
This study proposes an innovative security framework inspired by Control-Flow (CFA) mechanisms, traditionally used in cybersecurity.
We authenticate and verify the integrity of model updates across the network, effectively mitigating risks associated with model poisoning and adversarial interference.
arXiv Detail & Related papers (2024-03-15T04:03:34Z) - The Art of Defending: A Systematic Evaluation and Analysis of LLM
Defense Strategies on Safety and Over-Defensiveness [56.174255970895466]
Large Language Models (LLMs) play an increasingly pivotal role in natural language processing applications.
This paper presents Safety and Over-Defensiveness Evaluation (SODE) benchmark.
arXiv Detail & Related papers (2023-12-30T17:37:06Z) - Security and Privacy Issues of Federated Learning [0.0]
Federated Learning (FL) has emerged as a promising approach to address data privacy and confidentiality concerns.
This paper presents a comprehensive taxonomy of security and privacy challenges in Federated Learning (FL) across various machine learning models.
arXiv Detail & Related papers (2023-07-22T22:51:07Z) - Towards Safer Generative Language Models: A Survey on Safety Risks,
Evaluations, and Improvements [76.80453043969209]
This survey presents a framework for safety research pertaining to large models.
We begin by introducing safety issues of wide concern, then delve into safety evaluation methods for large models.
We explore the strategies for enhancing large model safety from training to deployment.
arXiv Detail & Related papers (2023-02-18T09:32:55Z) - Holistic Adversarial Robustness of Deep Learning Models [91.34155889052786]
Adversarial robustness studies the worst-case performance of a machine learning model to ensure safety and reliability.
This paper provides a comprehensive overview of research topics and foundational principles of research methods for adversarial robustness of deep learning models.
arXiv Detail & Related papers (2022-02-15T05:30:27Z) - Towards a Robust and Trustworthy Machine Learning System Development [0.09236074230806578]
We present our recent survey on the state-of-the-art ML trustworthiness and technologies from a security engineering perspective.
We then push our studies forward above and beyond a survey by describing a metamodel we created that represents the body of knowledge in a standard and visualized way for ML practitioners.
We propose future research directions motivated by our findings to advance the development of robust and trustworthy ML systems.
arXiv Detail & Related papers (2021-01-08T14:43:58Z) - Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance.
We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems.
We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.