Related papers: Towards Model Resistant to Transferable Adversarial Examples via Trigger Activation

Towards Model Resistant to Transferable Adversarial Examples via Trigger Activation

URL: http://arxiv.org/abs/2504.14541v1
Date: Sun, 20 Apr 2025 09:07:10 GMT
Title: Towards Model Resistant to Transferable Adversarial Examples via Trigger Activation
Authors: Yi Yu, Song Xia, Xun Lin, Chenqi Kong, Wenhan Yang, Shijian Lu, Yap-Peng Tan, Alex C. Kot,
Abstract summary: Adversarial examples, characterized by imperceptible perturbations, pose significant threats to deep neural networks by misleading their predictions.<n>We introduce a novel training paradigm aimed at enhancing robustness against transferable adversarial examples (TAEs) in a more efficient and effective way.
Score: 95.3977252782181
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adversarial examples, characterized by imperceptible perturbations, pose significant threats to deep neural networks by misleading their predictions. A critical aspect of these examples is their transferability, allowing them to deceive {unseen} models in black-box scenarios. Despite the widespread exploration of defense methods, including those on transferability, they show limitations: inefficient deployment, ineffective defense, and degraded performance on clean images. In this work, we introduce a novel training paradigm aimed at enhancing robustness against transferable adversarial examples (TAEs) in a more efficient and effective way. We propose a model that exhibits random guessing behavior when presented with clean data $\boldsymbol{x}$ as input, and generates accurate predictions when with triggered data $\boldsymbol{x}+\boldsymbol{\tau}$. Importantly, the trigger $\boldsymbol{\tau}$ remains constant for all data instances. We refer to these models as \textbf{models with trigger activation}. We are surprised to find that these models exhibit certain robustness against TAEs. Through the consideration of first-order gradients, we provide a theoretical analysis of this robustness. Moreover, through the joint optimization of the learnable trigger and the model, we achieve improved robustness to transferable attacks. Extensive experiments conducted across diverse datasets, evaluating a variety of attacking methods, underscore the effectiveness and superiority of our approach.

Related papers

MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z)
Indiscriminate Disruption of Conditional Inference on Multivariate Gaussians [60.22542847840578]
Despite advances in adversarial machine learning, inference for Gaussian models in the presence of an adversary is notably understudied. We consider a self-interested attacker who wishes to disrupt a decisionmaker's conditional inference and subsequent actions by corrupting a set of evidentiary variables. To avoid detection, the attacker also desires the attack to appear plausible wherein plausibility is determined by the density of the corrupted evidence.
arXiv Detail & Related papers (2024-11-21T17:46:55Z)
DTA: Distribution Transform-based Attack for Query-Limited Scenario [11.874670564015789]
In generating adversarial examples, the conventional black-box attack methods rely on sufficient feedback from the to-be-attacked models. This paper proposes a hard-label attack that simulates an attacked action being permitted to conduct a limited number of queries. Experiments validate the effectiveness of the proposed idea and the superiority of DTA over the state-of-the-art.
arXiv Detail & Related papers (2023-12-12T13:21:03Z)
SA-Attack: Improving Adversarial Transferability of Vision-Language Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios. We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z)
CT-GAT: Cross-Task Generative Adversarial Attack based on Transferability [24.272384832200522]
We propose a novel approach that directly constructs adversarial examples by extracting transferable features across various tasks. Specifically, we train a sequence-to-sequence generative model named CT-GAT using adversarial sample data collected from multiple tasks to acquire universal adversarial features. Results demonstrate that our method achieves superior attack performance with small cost.
arXiv Detail & Related papers (2023-10-22T11:00:04Z)
OMG-ATTACK: Self-Supervised On-Manifold Generation of Transferable Evasion Attacks [17.584752814352502]
Evasion Attacks (EA) are used to test the robustness of trained neural networks by distorting input data. We introduce a self-supervised, computationally economical method for generating adversarial examples. Our experiments consistently demonstrate the method is effective across various models, unseen data categories, and even defended models.
arXiv Detail & Related papers (2023-10-05T17:34:47Z)
Exploring Model Dynamics for Accumulative Poisoning Discovery [62.08553134316483]
We propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information. By implicitly transferring the changes in the data manipulation to that in the model outputs, Memorization Discrepancy can discover the imperceptible poison samples. We thoroughly explore its properties and propose Discrepancy-aware Sample Correction (DSC) to defend against accumulative poisoning attacks.
arXiv Detail & Related papers (2023-06-06T14:45:24Z)
Rethinking Model Ensemble in Transfer-based Adversarial Attacks [46.82830479910875]
An effective strategy to improve the transferability is attacking an ensemble of models. Previous works simply average the outputs of different models. We propose a Common Weakness Attack (CWA) to generate more transferable adversarial examples.
arXiv Detail & Related papers (2023-03-16T06:37:16Z)
Defending Variational Autoencoders from Adversarial Attacks with MCMC [74.36233246536459]
Variational autoencoders (VAEs) are deep generative models used in various domains. As previous work has shown, one can easily fool VAEs to produce unexpected latent representations and reconstructions for a visually slightly modified input. Here, we examine several objective functions for adversarial attacks construction, suggest metrics assess the model robustness, and propose a solution.
arXiv Detail & Related papers (2022-03-18T13:25:18Z)
Feature Importance-aware Transferable Adversarial Attacks [46.12026564065764]
Existing transferable attacks tend to craft adversarial examples by indiscriminately distorting features. We argue that such brute-force degradation would introduce model-specific local optimum into adversarial examples. By contrast, we propose the Feature Importance-aware Attack (FIA), which disrupts important object-aware features.
arXiv Detail & Related papers (2021-07-29T17:13:29Z)
Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data. In this paper, we propose variable-length textual adversarial attacks(VL-Attack) Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z)
Boosting Black-Box Attack with Partially Transferred Conditional Adversarial Distribution [83.02632136860976]
We study black-box adversarial attacks against deep neural networks (DNNs) We develop a novel mechanism of adversarial transferability, which is robust to the surrogate biases. Experiments on benchmark datasets and attacking against real-world API demonstrate the superior attack performance of the proposed method.
arXiv Detail & Related papers (2020-06-15T16:45:27Z)
Luring of transferable adversarial perturbations in the black-box paradigm [0.0]
We present a new approach to improve the robustness of a model against black-box transfer attacks. A removable additional neural network is included in the target model, and is designed to induce the textitluring effect. Our deception-based method only needs to have access to the predictions of the target model and does not require a labeled data set.
arXiv Detail & Related papers (2020-04-10T06:48:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.