Related papers: SoK: The Last Line of Defense: On Backdoor Defense Evaluation

SoK: The Last Line of Defense: On Backdoor Defense Evaluation

URL: http://arxiv.org/abs/2511.13143v1
Date: Mon, 17 Nov 2025 08:51:18 GMT
Title: SoK: The Last Line of Defense: On Backdoor Defense Evaluation
Authors: Gorka Abad, Marina Krček, Stefanos Koffas, Behrad Tajalli, Marco Arazzi, Roberto Riaño, Xiaoyun Xu, Zhuoran Liu, Antonino Nocera, Stjepan Picek,
Abstract summary: Backdoor attacks pose a significant threat to deep learning models by implanting hidden vulnerabilities that can be activated by malicious inputs.<n>This work presents a systematic (meta-)analysis of backdoor defenses through a comprehensive literature review and empirical evaluation.<n>We analyzed 183 backdoor defense papers published between 2018 and 2025 across major AI and security venues.
Score: 21.126129826672894
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Backdoor attacks pose a significant threat to deep learning models by implanting hidden vulnerabilities that can be activated by malicious inputs. While numerous defenses have been proposed to mitigate these attacks, the heterogeneous landscape of evaluation methodologies hinders fair comparison between defenses. This work presents a systematic (meta-)analysis of backdoor defenses through a comprehensive literature review and empirical evaluation. We analyzed 183 backdoor defense papers published between 2018 and 2025 across major AI and security venues, examining the properties and evaluation methodologies of these defenses. Our analysis reveals significant inconsistencies in experimental setups, evaluation metrics, and threat model assumptions in the literature. Through extensive experiments involving three datasets (MNIST, CIFAR-100, ImageNet-1K), four model architectures (ResNet-18, VGG-19, ViT-B/16, DenseNet-121), 16 representative defenses, and five commonly used attacks, totaling over 3\,000 experiments, we demonstrate that defense effectiveness varies substantially across different evaluation setups. We identify critical gaps in current evaluation practices, including insufficient reporting of computational overhead and behavior under benign conditions, bias in hyperparameter selection, and incomplete experimentation. Based on our findings, we provide concrete challenges and well-motivated recommendations to standardize and improve future defense evaluations. Our work aims to equip researchers and industry practitioners with actionable insights for developing, assessing, and deploying defenses to different systems.

Related papers

Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks [14.131197965001988]
We present the first comprehensive analysis of IPI-centric defense frameworks.<n>We introduce a comprehensive taxonomy of these defenses, classifying them along five dimensions.<n>We then thoroughly assess the security and usability of representative defense frameworks.
arXiv Detail & Related papers (2025-11-19T07:47:30Z)
A Survey on Model Extraction Attacks and Defenses for Large Language Models [55.60375624503877]
Model extraction attacks pose significant security threats to deployed language models.<n>This survey provides a comprehensive taxonomy of extraction attacks and defenses, categorizing attacks into functionality extraction, training data extraction, and prompt-targeted attacks.<n>We examine defense mechanisms organized into model protection, data privacy protection, and prompt-targeted strategies, evaluating their effectiveness across different deployment scenarios.
arXiv Detail & Related papers (2025-06-26T22:02:01Z)
A Critical Evaluation of Defenses against Prompt Injection Attacks [95.81023801370073]
Large Language Models (LLMs) are vulnerable to prompt injection attacks.<n>Several defenses have recently been proposed, often claiming to mitigate these attacks successfully.<n>We argue that existing studies lack a principled approach to evaluating these defenses.
arXiv Detail & Related papers (2025-05-23T19:39:56Z)
Revisiting Model Inversion Evaluation: From Misleading Standards to Reliable Privacy Assessment [63.07424521895492]
Model Inversion (MI) attacks aim to reconstruct information from private training data by exploiting access to machine learning models T.<n>The standard evaluation framework for such attacks relies on an evaluation model E, trained under the same task design as T.<n>This framework has become the de facto standard for assessing progress in MI research, used across nearly all recent MI attacks and defenses without question.
arXiv Detail & Related papers (2025-05-06T13:32:12Z)
Decoding FL Defenses: Systemization, Pitfalls, and Remedies [16.907513505608666]
There are no guidelines for evaluating Federated Learning (FL) defenses.<n>We design a comprehensive systemization of FL defenses along three dimensions.<n>We survey 50 top-tier defense papers and identify the commonly used components in their evaluation setups.
arXiv Detail & Related papers (2025-02-03T23:14:02Z)
A Survey of Defenses Against AI-Generated Visual Media: Detection,Disruption, and Authentication [29.96910246392549]
Deep generative models have demonstrated impressive performance in various computer vision applications.<n>These models may be used for malicious purposes, such as misinformation, deception, and copyright violation.<n>This paper provides a systematic and timely review of research efforts on defenses against AI-generated visual media.
arXiv Detail & Related papers (2024-07-15T09:46:02Z)
Evaluating Adversarial Robustness: A Comparison Of FGSM, Carlini-Wagner Attacks, And The Role of Distillation as Defense Mechanism [0.0]
The study explores adversarial attacks specifically targeted at Deep Neural Networks (DNNs) utilized for image classification. The research focuses on comprehending the ramifications of two prominent attack methodologies: the Fast Gradient Sign Method (FGSM) and the Carlini-Wagner (CW) approach. The study proposes the robustness of defensive distillation as a defense mechanism to counter FGSM and CW attacks.
arXiv Detail & Related papers (2024-04-05T17:51:58Z)
A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks [72.7373468905418]
We develop an open-source toolkit OpenBackdoor to foster the implementations and evaluations of textual backdoor learning. We also propose CUBE, a simple yet strong clustering-based defense baseline.
arXiv Detail & Related papers (2022-06-17T02:29:23Z)
Evaluating the Adversarial Robustness of Adaptive Test-time Defenses [60.55448652445904]
We categorize such adaptive testtime defenses and explain their potential benefits and drawbacks. Unfortunately, none significantly improve upon static models when evaluated appropriately. Some even weaken the underlying static model while simultaneously increasing inference cost.
arXiv Detail & Related papers (2022-02-28T12:11:40Z)
A Comprehensive Evaluation Framework for Deep Model Robustness [44.20580847861682]
Deep neural networks (DNNs) have achieved remarkable performance across a wide area of applications. They are vulnerable to adversarial examples, which motivates the adversarial defense. This paper presents a model evaluation framework containing a comprehensive, rigorous, and coherent set of evaluation metrics.
arXiv Detail & Related papers (2021-01-24T01:04:25Z)
Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks [65.20660287833537]
In this paper we propose two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function. We then combine our novel attacks with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness.
arXiv Detail & Related papers (2020-03-03T18:15:55Z)
On Adaptive Attacks to Adversarial Example Defenses [123.32678153377915]
This paper lays out the methodology and the approach necessary to perform an adaptive attack against defenses to adversarial examples. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples.
arXiv Detail & Related papers (2020-02-19T18:50:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.