Related papers: Feature-Space Adversarial Robustness Certification for Multimodal Large Language Models

Feature-Space Adversarial Robustness Certification for Multimodal Large Language Models

URL: http://arxiv.org/abs/2601.16200v2
Date: Tue, 27 Jan 2026 19:02:47 GMT
Title: Feature-Space Adversarial Robustness Certification for Multimodal Large Language Models
Authors: Song Xia, Meiwen Ding, Chenqi Kong, Wenhan Yang, Xudong Jiang,
Abstract summary: Multimodal large language models (MLLMs) exhibit strong capabilities across diverse applications.<n> MLLMs are vulnerable to adversarial perturbations that distort their feature representations and induce erroneous predictions.<n>We propose Feature-space Smoothing (FS), a general framework that provides certified robustness guarantees at the feature representation level of MLLMs.
Score: 59.6491828112519
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multimodal large language models (MLLMs) exhibit strong capabilities across diverse applications, yet remain vulnerable to adversarial perturbations that distort their feature representations and induce erroneous predictions. To address this vulnerability, we propose Feature-space Smoothing (FS), a general framework that provides certified robustness guarantees at the feature representation level of MLLMs. We theoretically prove that FS converts a given feature extractor into a smoothed variant that is guaranteed a certified lower bound on the cosine similarity between clean and adversarial features under $\ell_2$-bounded perturbations. Moreover, we establish that the value of this Feature Cosine Similarity Bound (FCSB) is determined by the intrinsic Gaussian robustness score of the given encoder. Building on this insight, we introduce the Gaussian Smoothness Booster (GSB), a plug-and-play module that enhances the Gaussian robustness score of pretrained MLLMs, thereby strengthening the robustness guaranteed by FS, without requiring additional MLLM retraining. Extensive experiments demonstrate that applying the FS to various MLLMs yields strong certified feature-space robustness and consistently leads to robust task-oriented performance across diverse applications.

Related papers

Uncertainty Quantification for Multimodal Large Language Models with Incoherence-adjusted Semantic Volume [45.38219855706969]
We introduce UMPIRE, a training-free uncertainty quantification framework for Multimodal Large Language Models (MLLMs)<n>UMPIRE computes the incoherence-adjusted semantic volume of sampled MLLM responses for a given task instance.<n>We show that UMPIRE consistently outperforms baseline metrics in error detection and uncertainty calibration across image, audio, and video-text benchmarks.
arXiv Detail & Related papers (2026-02-27T17:18:42Z)
MASPO: Unifying Gradient Utilization, Probability Mass, and Signal Reliability for Robust and Sample-Efficient LLM Reasoning [16.012761588513026]
Reinforcement Learning with Verifiable Rewards (RLVR) algorithms rely on rigid, uniform, and symmetric trust region mechanisms.<n>We propose Mass-Adaptive Soft Policy Optimization (MASPO), a unified framework designed to harmonize these three dimensions.<n> MASPO integrates a differentiable soft Gaussian gating to maximize gradient utility, a mass-adaptive limiter to balance exploration across the probability spectrum, and an asymmetric risk controller to align update magnitudes with signal confidence.
arXiv Detail & Related papers (2026-02-19T17:05:20Z)
Trust in One Round: Confidence Estimation for Large Language Models via Structural Signals [13.89434979851652]
Large language models (LLMs) are increasingly deployed in domains where errors carry high social, scientific, or safety costs.<n>We present Structural Confidence, a single-pass, model-agnostic framework that enhances output correctness prediction.
arXiv Detail & Related papers (2026-02-01T02:35:59Z)
MS-ISSM: Objective Quality Assessment of Point Clouds Using Multi-scale Implicit Structural Similarity [65.85858856481131]
unstructured and irregular nature of point clouds poses a significant challenge for objective quality assessment (PCQA)<n>We propose the Multi-scale Implicit Structural Similarity Measurement (MS-ISSM)
arXiv Detail & Related papers (2026-01-03T14:58:52Z)
A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models [85.30893355216486]
We study how visual token redundancy evolves with different dMLLM architectures and tasks.<n>Our study reveals that visual redundancy emerges only in from-scratch dMLLMs while handling long-answer tasks.<n>Layer-skipping is promising for accelerating AR-to-diffusion dMLLMs, whereas progressive or late-step pruning is more effective for from-scratch dMLLMs.
arXiv Detail & Related papers (2025-11-19T04:13:36Z)
Beyond Redundancy: Diverse and Specialized Multi-Expert Sparse Autoencoder [59.89996751196727]
Sparse autoencoders (SAEs) have emerged as a powerful tool for interpreting large language models.<n>SAEs' hidden layers have high dimensionality to satisfy sparsity constraints, resulting in prohibitive training and inference costs.<n>Recent Mixture of Experts (MoE) approaches attempt to address this by SAEs into narrower expert networks with gated activation.<n>We propose two key innovations: (1) Multiple Expert Activation that simultaneously engages semantically weighted expert subsets to encourage specialization, and (2) Feature Scaling that enhances diversity through adaptive high-frequency scaling.
arXiv Detail & Related papers (2025-11-07T22:19:34Z)
Harnessing Consistency for Robust Test-Time LLM Ensemble [88.55393815158608]
CoRE is a plug-and-play technique that harnesses model consistency for robust LLM ensemble.<n> Token-level consistency captures fine-grained disagreements by applying a low-pass filter to downweight uncertain tokens.<n>Model-level consistency models global agreement by promoting model outputs with high self-confidence.
arXiv Detail & Related papers (2025-10-12T04:18:45Z)
When MLLMs Meet Compression Distortion: A Coding Paradigm Tailored to MLLMs [38.29061845878822]
We propose an image Codec TAilored to MLLMs (CoTAM) designed to adaptively protect multi-level features and suit different demands of downstream tasks.<n>Our method achieves up to 35.99% saving while maintaining the same performance on the MLLM tasks.
arXiv Detail & Related papers (2025-09-29T04:07:52Z)
A Fano-Style Accuracy Upper Bound for LLM Single-Pass Reasoning in Multi-Hop QA [65.38186593873313]
Multi-Hop Question Answering (MHQA) requires integrating dispersed, interdependent evidence through sequential reasoning under noise.<n>We introduce a proof-of-concept multi-call framework for MHQA, InfoQA.<n>We construct a stringent and noise-rich benchmark to validate our theory and framework.
arXiv Detail & Related papers (2025-09-25T14:11:57Z)
FORCE: Transferable Visual Jailbreaking Attacks via Feature Over-Reliance CorrEction [82.6826848085638]
Visual jailbreaking attacks can manipulate open-source MLLMs more readily than sophisticated textual attacks.<n>These attacks exhibit extremely limited cross-model transferability, failing to reliably identify vulnerabilities in closed-source MLLMs.<n>We propose a Feature Over-Reliance CorrEction (FORCE) method, which guides the attack to explore broader feasible regions.
arXiv Detail & Related papers (2025-09-25T11:36:56Z)
FESTA: Functionally Equivalent Sampling for Trust Assessment of Multimodal LLMs [20.08099668437471]
multimodal large language models (MLLMs) generated predictions can enable selective prediction and improve user confidence.<n>We propose Functionally Equivalent Sampling for Trust Assessment (FESTA), a multimodal input sampling technique for MLLMs.<n>FESTA generates an uncertainty measure based on the equivalent and complementary input samplings.
arXiv Detail & Related papers (2025-09-20T11:50:22Z)
IPBA: Imperceptible Perturbation Backdoor Attack in Federated Self-Supervised Learning [13.337697403537488]
Federated self-supervised learning (FSSL) combines the advantages of decentralized modeling and unlabeled representation learning.<n>Research indicates that FSSL remains vulnerable to backdoor attacks.<n>We propose an imperceptible and effective backdoor attack method against FSSL, called IPBA.
arXiv Detail & Related papers (2025-08-11T14:36:11Z)
Graft: Integrating the Domain Knowledge via Efficient Parameter Synergy for MLLMs [56.76586846269894]
Multimodal Large Language Models (MLLMs) have achieved success across various domains.<n>Despite its importance, the study of knowledge sharing among domain-specific MLLMs remains largely underexplored.<n>We propose a unified parameter integration framework that enables modular composition of expert capabilities.
arXiv Detail & Related papers (2025-06-30T15:07:41Z)
Uncertainty Quantification of Large Language Models through Multi-Dimensional Responses [4.505944978127014]
We introduce a multi-dimensional UQ framework that integrates semantic and knowledge-aware similarity analysis.<n>This approach disentangles overlapping information from both semantic and knowledge dimensions, capturing both semantic variations and factual consistency.<n>Our empirical evaluations demonstrate that our method outperforms existing techniques in identifying uncertain responses.
arXiv Detail & Related papers (2025-02-24T04:05:08Z)
FDLLM: A Dedicated Detector for Black-Box LLMs Fingerprinting [20.07438999071414]
Large Language Models (LLMs) are rapidly transforming the landscape of digital content creation.<n>We present FD-Dataset, a comprehensive bilingual fingerprinting benchmark comprising 90,000 text samples from 20 famous proprietary and open-source LLMs.<n>We also present FDLLM, a novel fingerprinting method that leverages parameter-efficient Low-Rank Adaptation (LoRA) to fine-tune a foundation model.
arXiv Detail & Related papers (2025-01-27T13:18:40Z)
CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration [90.36429361299807]
multimodal large language models (MLLMs) have demonstrated remarkable success in engaging in conversations involving visual inputs. The integration of visual modality has introduced a unique vulnerability: the MLLM becomes susceptible to malicious visual inputs. We introduce a technique termed CoCA, which amplifies the safety-awareness of the MLLM by calibrating its output distribution.
arXiv Detail & Related papers (2024-09-17T17:14:41Z)
COMMIT: Certifying Robustness of Multi-Sensor Fusion Systems against Semantic Attacks [24.37030085306459]
We propose the first robustness certification framework COMMIT certify robustness of multi-sensor fusion systems against semantic attacks. In particular, we propose a practical anisotropic noise mechanism that leverages randomized smoothing with multi-modal data. We show that the certification for MSF models is at most 48.39% higher than that of single-modal models, which validates the advantages of MSF models.
arXiv Detail & Related papers (2024-03-04T18:57:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.