Related papers: Attention Hijacking in Trojan Transformers

Attention Hijacking in Trojan Transformers

URL: http://arxiv.org/abs/2208.04946v1
Date: Tue, 9 Aug 2022 04:05:04 GMT
Title: Attention Hijacking in Trojan Transformers
Authors: Weimin Lyu, Songzhu Zheng, Tengfei Ma, Haibin Ling, Chao Chen
Abstract summary: Trojan attacks pose a severe threat to AI systems. Recent works on Transformer models received explosive popularity. Can we reveal the Trojans through attention mechanisms in BERTs and ViTs?
Score: 68.04317938014067
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Trojan attacks pose a severe threat to AI systems. Recent works on Transformer models received explosive popularity and the self-attentions are now indisputable. This raises a central question: Can we reveal the Trojans through attention mechanisms in BERTs and ViTs? In this paper, we investigate the attention hijacking pattern in Trojan AIs, \ie, the trigger token ``kidnaps'' the attention weights when a specific trigger is present. We observe the consistent attention hijacking pattern in Trojan Transformers from both Natural Language Processing (NLP) and Computer Vision (CV) domains. This intriguing property helps us to understand the Trojan mechanism in BERTs and ViTs. We also propose an Attention-Hijacking Trojan Detector (AHTD) to discriminate the Trojan AIs from the clean ones.

Related papers

TrojanDec: Data-free Detection of Trojan Inputs in Self-supervised Learning [34.62283824723201]
TrojanDec is the first data-free method to identify and recover a test input embedded with a trigger. Our evaluation shows that TrojanDec can effectively identify the trojan from a given test input and recover it under state-of-the-art trojan attacks.
arXiv Detail & Related papers (2025-01-07T19:35:19Z)
TroLLoc: Logic Locking and Layout Hardening for IC Security Closure against Hardware Trojans [21.7375312616769]
TroLLoc is a novel scheme for IC security closure that employs, for the first time, logic locking and layout hardening in unison. We show that TroLLoc successfully renders layouts resilient, with reasonable overheads, against (i.e., general prospects for Trojan insertion as in the ISPD'22 contest, (ii) actual Trojan insertion as in the ISPD'23 contest, and (iii) potential second-order attacks.
arXiv Detail & Related papers (2024-05-09T07:25:38Z)
Hardly Perceptible Trojan Attack against Neural Networks with Bit Flips [51.17948837118876]
We present hardly perceptible Trojan attack (HPT) HPT crafts hardly perceptible Trojan images by utilizing the additive noise and per pixel flow field. To achieve superior attack performance, we propose to jointly optimize bit flips, additive noise, and flow field.
arXiv Detail & Related papers (2022-07-27T09:56:17Z)
Defense Against Multi-target Trojan Attacks [31.54111353219381]
Trojan attacks are the hardest to defend against. Badnet kind of attacks introduces Trojan backdoors to multiple target classes and allows triggers to be placed anywhere in the image. To defend against this attack, we first introduce a trigger reverse-engineering mechanism that uses multiple images to recover a variety of potential triggers. We then propose a detection mechanism by measuring the transferability of such recovered triggers.
arXiv Detail & Related papers (2022-07-08T13:29:13Z)
Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free [126.15842954405929]
Trojan attacks threaten deep neural networks (DNNs) by poisoning them to behave normally on most samples, yet to produce manipulated results for inputs attached with a trigger. We propose a novel Trojan network detection regime: first locating a "winning Trojan lottery ticket" which preserves nearly full Trojan information yet only chance-level performance on clean inputs; then recovering the trigger embedded in this already isolated subnetwork.
arXiv Detail & Related papers (2022-05-24T06:33:31Z)
A Study of the Attention Abnormality in Trojaned BERTs [12.623010398576067]
Trojan attacks raise serious security concerns. We observe the attention focus drifting behavior of Trojaned models. We propose an attention-based Trojan detector to distinguish Trojaned models from clean ones.
arXiv Detail & Related papers (2022-05-13T16:48:37Z)
Towards Effective and Robust Neural Trojan Defenses via Input Filtering [67.01177442955522]
Trojan attacks on deep neural networks are both dangerous and surreptitious. Over the past few years, Trojan attacks have advanced from using only a simple trigger and targeting only one class to using many sophisticated triggers and targeting multiple classes. Most defense methods still make out-of-date assumptions about Trojan triggers and target classes, thus, can be easily circumvented by modern Trojan attacks.
arXiv Detail & Related papers (2022-02-24T15:41:37Z)
CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing [16.44147178061005]
trojaned behaviors triggered by various trojan attacks can be attributed to the trojan path. We propose CatchBackdoor, a detection method against trojan attacks.
arXiv Detail & Related papers (2021-12-24T13:57:03Z)
Odyssey: Creation, Analysis and Detection of Trojan Models [91.13959405645959]
Trojan attacks interfere with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger. Existing Trojan detectors make strong assumptions about the types of triggers and attacks. We propose a detector that is based on the analysis of the intrinsic properties; that are affected due to the Trojaning process.
arXiv Detail & Related papers (2020-07-16T06:55:00Z)
An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks [59.42357806777537]
trojan attack aims to attack deployed deep neural networks (DNNs) relying on hidden trigger patterns inserted by hackers. We propose a training-free attack approach which is different from previous work, in which trojaned behaviors are injected by retraining model on a poisoned dataset. The proposed TrojanNet has several nice properties including (1) it activates by tiny trigger patterns and keeps silent for other signals, (2) it is model-agnostic and could be injected into most DNNs, dramatically expanding its attack scenarios, and (3) the training-free mechanism saves massive training efforts compared to conventional trojan attack methods.
arXiv Detail & Related papers (2020-06-15T04:58:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.