Related papers: Make Anything Match Your Target: Universal Adversarial Perturbations against Closed-Source MLLMs via Multi-Crop Routed Meta Optimization

Make Anything Match Your Target: Universal Adversarial Perturbations against Closed-Source MLLMs via Multi-Crop Routed Meta Optimization

URL: http://arxiv.org/abs/2601.23179v1
Date: Fri, 30 Jan 2026 17:03:24 GMT
Title: Make Anything Match Your Target: Universal Adversarial Perturbations against Closed-Source MLLMs via Multi-Crop Routed Meta Optimization
Authors: Hui Lu, Yi Yu, Yiming Yang, Chenyu Yi, Xueyi Ke, Qixing Zhang, Bingquan Shen, Alex Kot, Xudong Jiang,
Abstract summary: We study a more stringent setting, Universal Targeted Transferable Adversarial Attacks (UTTAA)<n>A single perturbation must consistently steer arbitrary inputs toward a specified target across unknown commercial MLLMs.<n>We propose M CRMO-Attack, which stabilizes supervision via Multi-Crop Aggregation with an Attention-Guided Crop.
Score: 49.30177419529011
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Targeted adversarial attacks on closed-source multimodal large language models (MLLMs) have been increasingly explored under black-box transfer, yet prior methods are predominantly sample-specific and offer limited reusability across inputs. We instead study a more stringent setting, Universal Targeted Transferable Adversarial Attacks (UTTAA), where a single perturbation must consistently steer arbitrary inputs toward a specified target across unknown commercial MLLMs. Naively adapting existing sample-wise attacks to this universal setting faces three core difficulties: (i) target supervision becomes high-variance due to target-crop randomness, (ii) token-wise matching is unreliable because universality suppresses image-specific cues that would otherwise anchor alignment, and (iii) few-source per-target adaptation is highly initialization-sensitive, which can degrade the attainable performance. In this work, we propose MCRMO-Attack, which stabilizes supervision via Multi-Crop Aggregation with an Attention-Guided Crop, improves token-level reliability through alignability-gated Token Routing, and meta-learns a cross-target perturbation prior that yields stronger per-target solutions. Across commercial MLLMs, we boost unseen-image attack success rate by +23.7\% on GPT-4o and +19.9\% on Gemini-2.0 over the strongest universal baseline.

Related papers

Multi-Paradigm Collaborative Adversarial Attack Against Multi-Modal Large Language Models [67.45032003041399]
We propose a novel Multi-Paradigm Collaborative Attack (MPCAttack) framework to boost the transferability of adversarial examples against MLLMs.<n>MPCO adaptively balances the importance of different paradigm representations and guides the global optimisation.<n>Our solution consistently outperforms state-of-the-art methods in both targeted and untargeted attacks on open-source and closed-source MLLMs.
arXiv Detail & Related papers (2026-03-05T06:01:26Z)
MASPO: Unifying Gradient Utilization, Probability Mass, and Signal Reliability for Robust and Sample-Efficient LLM Reasoning [16.012761588513026]
Reinforcement Learning with Verifiable Rewards (RLVR) algorithms rely on rigid, uniform, and symmetric trust region mechanisms.<n>We propose Mass-Adaptive Soft Policy Optimization (MASPO), a unified framework designed to harmonize these three dimensions.<n> MASPO integrates a differentiable soft Gaussian gating to maximize gradient utility, a mass-adaptive limiter to balance exploration across the probability spectrum, and an asymmetric risk controller to align update magnitudes with signal confidence.
arXiv Detail & Related papers (2026-02-19T17:05:20Z)
Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity [13.211627219720796]
Reinforcement Learning (RL) has become the de facto standard for tuning LLMs to solve tasks involving reasoning.<n>We argue that RL implicitly optimize the "mode-seeking" or "zero-forcing" Reverse KL to a target distribution causing the model to concentrate mass on certain high-probability regions of the target while others.<n>In this work, we instead begin from an explicit target distribution, obtained by filtering out incorrect answers while neglecting the relative probabilities of correct ones.
arXiv Detail & Related papers (2025-12-05T18:56:40Z)
MTAttack: Multi-Target Backdoor Attacks against Large Vision-Language Models [52.37749859972453]
We propose MTAttack, the first multi-target backdoor attack framework for enforcing accurate multiple trigger-target mappings in LVLMs.<n> Experiments on popular benchmarks demonstrate a high success rate of MTAttack for multi-target attacks.<n>Our attack exhibits strong generalizability across datasets and robustness against backdoor defense strategies.
arXiv Detail & Related papers (2025-11-13T09:00:21Z)
Enhancing Targeted Adversarial Attacks on Large Vision-Language Models via Intermediate Projector [24.390527651215944]
Black-box adversarial attacks pose a particularly severe threat to Large Vision-Language Models (VLMs)<n>We propose a novel black-box targeted attack framework that leverages the projector.<n> Specifically, we utilize the widely adopted Querying Transformer (Q-Former) which transforms global image embeddings into fine-grained query outputs.
arXiv Detail & Related papers (2025-08-19T11:23:09Z)
Backdoor Cleaning without External Guidance in MLLM Fine-tuning [76.82121084745785]
Believe Your Eyes (BYE) is a data filtering framework that leverages attention entropy patterns as self-supervised signals to identify and filter backdoor samples.<n>It achieves near-zero attack success rates while maintaining clean-task performance.
arXiv Detail & Related papers (2025-05-22T17:11:58Z)
A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1 [43.32593407341789]
Despite promising performance on open-source large vision-language models, transfer-based targeted attacks often fail against closed-source commercial LVLMs.<n>We propose to refine semantic clarity by encoding explicit semantic details within local regions.<n>Our approach achieves success rates exceeding 90% on GPT-4.5, 4o, o1, significantly outperforming all prior state-of-the-art attack methods.
arXiv Detail & Related papers (2025-03-13T17:59:55Z)
Pick of the Bunch: Detecting Infrared Small Targets Beyond Hit-Miss Trade-Offs via Selective Rank-Aware Attention [22.580497586948198]
Infrared small target detection faces the inherent challenge of precisely localizing dim targets amidst complex background clutter. We propose SeRankDet, a deep network that achieves high accuracy beyond the conventional hit-miss trade-off.
arXiv Detail & Related papers (2024-08-07T12:10:32Z)
Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications. Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space. We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z)
Logit Margin Matters: Improving Transferable Targeted Adversarial Attack by Logit Calibration [85.71545080119026]
Cross-Entropy (CE) loss function is insufficient to learn transferable targeted adversarial examples. We propose two simple and effective logit calibration methods, which are achieved by downscaling the logits with a temperature factor and an adaptive margin. Experiments conducted on the ImageNet dataset validate the effectiveness of the proposed methods.
arXiv Detail & Related papers (2023-03-07T06:42:52Z)
Generative multitask learning mitigates target-causing confounding [61.21582323566118]
We propose a simple and scalable approach to causal representation learning for multitask learning. The improvement comes from mitigating unobserved confounders that cause the targets, but not the input. Our results on the Attributes of People and Taskonomy datasets reflect the conceptual improvement in robustness to prior probability shift.
arXiv Detail & Related papers (2022-02-08T20:42:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.