To Deceive is to Teach? Forging Perceptual Robustness via Adversarial Reinforcement Learning
- URL: http://arxiv.org/abs/2602.22227v1
- Date: Sat, 24 Jan 2026 03:47:29 GMT
- Title: To Deceive is to Teach? Forging Perceptual Robustness via Adversarial Reinforcement Learning
- Authors: Yicheng Bao, Xuhong Wang, Xin Tan,
- Abstract summary: We introduce textbfAOT (Adversarial Opponent Training), a self-play framework that forges MLLM robustness by creating its own training data.<n>Our method orchestrates a co-evolution between an image-editing Attacker and a Defender MLLM, where the Attacker generates a diverse and dynamic curriculum of image manipulations, forcing the Defender to adapt and improve.
- Score: 6.973386095391195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite their impressive capabilities, Multimodal Large Language Models (MLLMs) exhibit perceptual fragility when confronted with visually complex scenes. This weakness stems from a reliance on finite training datasets, which are prohibitively expensive to scale and impose a ceiling on model robustness. We introduce \textbf{AOT-SFT}, a large-scale adversarial dataset for bootstrapping MLLM robustness. Building on this, we propose \textbf{AOT (Adversarial Opponent Training)}, a self-play framework that forges MLLM robustness by creating its own training data. Our method orchestrates a co-evolution between an image-editing Attacker and a Defender MLLM, where the Attacker generates a diverse and dynamic curriculum of image manipulations, forcing the Defender to adapt and improve. Extensive experiments demonstrate that AOT enhances the Defender's perceptual robustness and reduces hallucinations, establishing a scalable paradigm for training more reliable MLLMs.
Related papers
- From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation [3.75886080255807]
We introduce two innovative attack frameworks designed to generate dynamic and adaptive adversarial examples.<n>We produce subtle and natural-looking adversarial inputs that preserve semantic similarity to the original text.<n>Our attacks evolve with the advancements in LLMs and demonstrate strong transferability acrossversa unknown to the attacker.
arXiv Detail & Related papers (2025-11-05T02:27:56Z) - Robust Anti-Backdoor Instruction Tuning in LVLMs [53.766434746801366]
We introduce a lightweight, certified-agnostic defense framework for large visual language models (LVLMs)<n>Our framework finetunes only adapter modules and text embedding layers under instruction tuning.<n>Experiments against seven attacks on Flickr30k and MSCOCO demonstrate that ours reduces their attack success rate to nearly zero.
arXiv Detail & Related papers (2025-06-04T01:23:35Z) - Adversarial Training for Multimodal Large Language Models against Jailbreak Attacks [17.75247947379804]
We present the first adversarial training paradigm tailored to defend against jailbreak attacks during the MLLM training phase.<n>We introduce Projection Layer Against Adversarial Training (ProEAT), an end-to-end AT framework.<n>ProEAT achieves state-of-the-art defense performance, outperforming existing baselines by an average margin of +34% across text and image modalities.
arXiv Detail & Related papers (2025-03-05T14:13:35Z) - Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models [26.656858396343726]
Multi-modal Large Language Models (MLLMs) excel in vision-language tasks but remain vulnerable to visual adversarial perturbations.<n>Existing methods seek to mitigate these risks by applying constrained adversarial fine-tuning to CLIP vision encoders on ImageNet-scale data.<n>We explore an alternative approach of leveraging existing vision classification models that have been adversarially pre-trained on large-scale data.
arXiv Detail & Related papers (2025-02-03T17:59:45Z) - TWIST & SCOUT: Grounding Multimodal LLM-Experts by Forget-Free Tuning [54.033346088090674]
We introduce TWIST & SCOUT, a framework that equips pre-trained MLLMs with visual grounding ability.<n>To fine-tune the model effectively, we generate a high-quality synthetic dataset we call SCOUT.<n>This dataset provides rich supervision signals, describing a step-by-step multimodal reasoning process.
arXiv Detail & Related papers (2024-10-14T13:35:47Z) - MoExtend: Tuning New Experts for Modality and Task Extension [61.29100693866109]
MoExtend is an effective framework designed to streamline the modality adaptation and extension of Mixture-of-Experts (MoE) models.
MoExtend seamlessly integrates new experts into pre-trained MoE models, endowing them with novel knowledge without the need to tune pretrained models.
arXiv Detail & Related papers (2024-08-07T02:28:37Z) - Machine Vision Therapy: Multimodal Large Language Models Can Enhance Visual Robustness via Denoising In-Context Learning [67.0609518552321]
We propose to conduct Machine Vision Therapy which aims to rectify the noisy predictions from vision models.
By fine-tuning with the denoised labels, the learning model performance can be boosted in an unsupervised manner.
arXiv Detail & Related papers (2023-12-05T07:29:14Z) - Beyond Pretrained Features: Noisy Image Modeling Provides Adversarial
Defense [52.66971714830943]
Masked image modeling (MIM) has made it a prevailing framework for self-supervised visual representation learning.
In this paper, we investigate how this powerful self-supervised learning paradigm can provide adversarial robustness to downstream classifiers.
We propose an adversarial defense method, referred to as De3, by exploiting the pretrained decoder for denoising.
arXiv Detail & Related papers (2023-02-02T12:37:24Z) - FAT: Federated Adversarial Training [5.287156503763459]
Federated learning (FL) is one of the most important paradigms addressing privacy and data governance issues in machine learning (ML)
We take the first known steps towards federated adversarial training (FAT) combining both methods to reduce the threat of evasion during inference while preserving the data privacy during training.
arXiv Detail & Related papers (2020-12-03T09:47:47Z) - Stylized Adversarial Defense [105.88250594033053]
adversarial training creates perturbation patterns and includes them in the training set to robustify the model.
We propose to exploit additional information from the feature space to craft stronger adversaries.
Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses.
arXiv Detail & Related papers (2020-07-29T08:38:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.