Backdoor Attacks and Defenses in Computer Vision Domain: A Survey
- URL: http://arxiv.org/abs/2509.07504v1
- Date: Tue, 09 Sep 2025 08:38:05 GMT
- Title: Backdoor Attacks and Defenses in Computer Vision Domain: A Survey
- Authors: Bilal Hussain Abbasi, Yanjun Zhang, Leo Zhang, Shang Gao,
- Abstract summary: Backdoor attacks embed hidden, controllable behaviors into machine-learning models.<n>This survey reviews the rapidly growing literature on backdoor attacks and defenses in the computer-vision domain.
- Score: 12.384887829851834
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Backdoor (trojan) attacks embed hidden, controllable behaviors into machine-learning models so that models behave normally on benign inputs but produce attacker-chosen outputs when a trigger is present. This survey reviews the rapidly growing literature on backdoor attacks and defenses in the computer-vision domain. We introduce a multi-dimensional taxonomy that organizes attacks and defenses by injection stage (dataset poisoning, model/parameter modification, inference-time injection), trigger type (patch, blended/frequency, semantic, transformation), labeling strategy (dirty-label vs. clean-label / feature-collision), representation stage (instance-specific, manifold/class-level, neuron/parameter hijacking, distributed encodings), and target task (classification, detection, segmentation, video, multimodal). For each axis we summarize representative methods, highlight evaluation practices, and discuss where defenses succeed or fail. For example, many classical sanitization and reverse-engineering tools are effective against reusable patch attacks but struggle with input-aware, sample-specific, or parameter-space backdoors and with transfer via compromised pre-trained encoders or hardware bit-flips. We synthesize trends, identify persistent gaps (supply-chain and hardware threats, certifiable defenses, cross-task benchmarks), and propose practical guidelines for threat-aware evaluation and layered defenses. This survey aims to orient researchers and practitioners to the current threat landscape and pressing research directions in secure computer vision.
Related papers
- SCOUT: A Defense Against Data Poisoning Attacks in Fine-Tuned Language Models [11.304852987259041]
This paper introduces three novel contextually-aware attack scenarios that exploit domain-specific knowledge and semantic plausibility.<n>We present textbfSCOUT (Saliency-based Classification Of Untrusted Tokens), a novel defense framework that identifies backdoor triggers through token-level saliency analysis.
arXiv Detail & Related papers (2025-12-10T17:25:55Z) - BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks [58.959622170433725]
BlindGuard is an unsupervised defense method that learns without requiring any attack-specific labels or prior knowledge of malicious behaviors.<n>We show that BlindGuard effectively detects diverse attack types (i.e., prompt injection, memory poisoning, and tool attack) across multi-agent systems.
arXiv Detail & Related papers (2025-08-11T16:04:47Z) - Mind the Gap: Detecting Black-box Adversarial Attacks in the Making through Query Update Analysis [3.795071937009966]
Adrial attacks can jeopardize the integrity of Machine Learning (ML) models.<n>We propose a framework that detects if an adversarial noise instance is being generated.<n>We evaluate our approach against 8 state-of-the-art attacks, including adaptive attacks.
arXiv Detail & Related papers (2025-03-04T20:25:12Z) - On the Credibility of Backdoor Attacks Against Object Detectors in the Physical World [27.581277955830746]
We investigate the viability of physical object-triggered backdoor attacks in application settings.
We construct a new, cost-efficient attack method, dubbed MORPHING, incorporating the unique nature of detection tasks.
We release an extensive video test set of real-world backdoor attacks.
arXiv Detail & Related papers (2024-08-22T04:29:48Z) - Pre-trained Trojan Attacks for Visual Recognition [106.13792185398863]
Pre-trained vision models (PVMs) have become a dominant component due to their exceptional performance when fine-tuned for downstream tasks.
We propose the Pre-trained Trojan attack, which embeds backdoors into a PVM, enabling attacks across various downstream vision tasks.
We highlight the challenges posed by cross-task activation and shortcut connections in successful backdoor attacks.
arXiv Detail & Related papers (2023-12-23T05:51:40Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Temporal-Distributed Backdoor Attack Against Video Based Action
Recognition [21.916002204426853]
We introduce a simple yet effective backdoor attack against video data.
Our proposed attack, adding perturbations in a transformed domain, plants an imperceptible, temporally distributed trigger across the video frames.
arXiv Detail & Related papers (2023-08-21T22:31:54Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Detecting Backdoors in Deep Text Classifiers [43.36440869257781]
We present the first robust defence mechanism that generalizes to several backdoor attacks against text classification models.
Our technique is highly accurate at defending against state-of-the-art backdoor attacks, including data poisoning and weight poisoning.
arXiv Detail & Related papers (2022-10-11T07:48:03Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.