Diversifying Counterattacks: Orthogonal Exploration for Robust CLIP Inference
- URL: http://arxiv.org/abs/2511.09064v1
- Date: Thu, 13 Nov 2025 01:30:08 GMT
- Title: Diversifying Counterattacks: Orthogonal Exploration for Robust CLIP Inference
- Authors: Chengze Jiang, Minjing Dong, Xinli Shi, Jie Gui,
- Abstract summary: We argue that enhancing the diversity and coverage of counterattacks is crucial to improving adversarial robustness in test-time defense.<n>We propose Directional Orthogonal Counterattack (DOC), which augments counterattack optimization by incorporating gradient directions and momentum-based updates.<n>We present a directional sensitivity score based on averaged cosine similarity to boost DOC by improving example discrimination and adaptively modulating the counterattack strength.
- Score: 45.723695657400576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-language pre-training models (VLPs) demonstrate strong multimodal understanding and zero-shot generalization, yet remain vulnerable to adversarial examples, raising concerns about their reliability. Recent work, Test-Time Counterattack (TTC), improves robustness by generating perturbations that maximize the embedding deviation of adversarial inputs using PGD, pushing them away from their adversarial representations. However, due to the fundamental difference in optimization objectives between adversarial attacks and counterattacks, generating counterattacks solely based on gradients with respect to the adversarial input confines the search to a narrow space. As a result, the counterattacks could overfit limited adversarial patterns and lack the diversity to fully neutralize a broad range of perturbations. In this work, we argue that enhancing the diversity and coverage of counterattacks is crucial to improving adversarial robustness in test-time defense. Accordingly, we propose Directional Orthogonal Counterattack (DOC), which augments counterattack optimization by incorporating orthogonal gradient directions and momentum-based updates. This design expands the exploration of the counterattack space and increases the diversity of perturbations, which facilitates the discovery of more generalizable counterattacks and ultimately improves the ability to neutralize adversarial perturbations. Meanwhile, we present a directional sensitivity score based on averaged cosine similarity to boost DOC by improving example discrimination and adaptively modulating the counterattack strength. Extensive experiments on 16 datasets demonstrate that DOC improves adversarial robustness under various attacks while maintaining competitive clean accuracy. Code is available at https://github.com/bookman233/DOC.
Related papers
- Towards Highly Transferable Vision-Language Attack via Semantic-Augmented Dynamic Contrastive Interaction [67.45032003041399]
We propose a Semantic-Augmented Dynamic Contrastive Attack (SADCA) that enhances adversarial transferability through progressive and semantically guided perturbations.<n>SADCA establishes a contrastive learning mechanism involving adversarial, positive and negative samples, to reinforce the semantic inconsistency of the obtained perturbations.<n>Experiments on multiple datasets and models demonstrate that SADCA significantly improves adversarial transferability and consistently surpasses state-of-the-art methods.
arXiv Detail & Related papers (2026-03-05T05:46:16Z) - Potent but Stealthy: Rethink Profile Pollution against Sequential Recommendation via Bi-level Constrained Reinforcement Paradigm [41.84340922774345]
Sequential Recommenders, which exploit dynamic user intents through interaction sequences, are vulnerable to adversarial attacks.<n>This paper focuses on the Profile Pollution Attack that subtly contaminates partial user interactions to induce targeted mispredictions.<n>We propose a constrained reinforcement driven attack CREAT that synergizes a bi-level optimization framework with multi-reward reinforcement learning to balance adversarial efficacy and stealthiness.
arXiv Detail & Related papers (2025-11-12T15:00:52Z) - Curriculum-Guided Antifragile Reinforcement Learning for Secure UAV Deconfliction under Observation-Space Attacks [6.367978467906828]
Reinforcement learning policies are vulnerable to adversarial attacks in the observation space.<n>We propose an antifragile RL framework designed to adapt against curriculum of incremental adversarial perturbations.<n>Results show that the antifragile policy consistently outperforms standard and robust RL baselines.
arXiv Detail & Related papers (2025-06-26T10:10:41Z) - Explainer-guided Targeted Adversarial Attacks against Binary Code Similarity Detection Models [12.524811181751577]
We propose a novel optimization for adversarial attacks against BCSD models.<n>In particular, we aim to improve the attacks in a challenging scenario, where the attack goal is to limit the model predictions to a specific range.<n>Our attack leverages the superior capability of black-box, model-agnostic explainers in interpreting the model decision boundaries.
arXiv Detail & Related papers (2025-06-05T08:29:19Z) - Efficient Generation of Targeted and Transferable Adversarial Examples for Vision-Language Models Via Diffusion Models [17.958154849014576]
Adversarial attacks can be used to assess the robustness of large visual-language models (VLMs)<n>Previous transfer-based adversarial attacks incur high costs due to high iteration counts and complex method structure.<n>We propose AdvDiffVLM, which uses diffusion models to generate natural, unrestricted and targeted adversarial examples.
arXiv Detail & Related papers (2024-04-16T07:19:52Z) - Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks [62.036798488144306]
Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked.
We propose an attack-agnostic defense method named Meta Invariance Defense (MID)
We show that MID simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration.
arXiv Detail & Related papers (2024-04-04T10:10:38Z) - Interpretability is a Kind of Safety: An Interpreter-based Ensemble for
Adversary Defense [28.398901783858005]
We propose an interpreter-based ensemble framework called X-Ensemble for robust defense adversary.
X-Ensemble employs the Random Forests (RF) model to combine sub-detectors into an ensemble detector for adversarial hybrid attacks defense.
arXiv Detail & Related papers (2023-04-14T04:32:06Z) - Guidance Through Surrogate: Towards a Generic Diagnostic Attack [101.36906370355435]
We develop a guided mechanism to avoid local minima during attack optimization, leading to a novel attack dubbed Guided Projected Gradient Attack (G-PGA)
Our modified attack does not require random restarts, large number of attack iterations or search for an optimal step-size.
More than an effective attack, G-PGA can be used as a diagnostic tool to reveal elusive robustness due to gradient masking in adversarial defenses.
arXiv Detail & Related papers (2022-12-30T18:45:23Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial
Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically.
Our method learns the in adversarial attacks parameterized by a recurrent neural network.
We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z) - Guided Adversarial Attack for Evaluating and Enhancing Adversarial
Defenses [59.58128343334556]
We introduce a relaxation term to the standard loss, that finds more suitable gradient-directions, increases attack efficacy and leads to more efficient adversarial training.
We propose Guided Adversarial Margin Attack (GAMA), which utilizes function mapping of the clean image to guide the generation of adversaries.
We also propose Guided Adversarial Training (GAT), which achieves state-of-the-art performance amongst single-step defenses.
arXiv Detail & Related papers (2020-11-30T16:39:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.