Related papers: Safety-Efficacy Trade Off: Robustness against Data-Poisoning

Safety-Efficacy Trade Off: Robustness against Data-Poisoning

URL: http://arxiv.org/abs/2602.00822v1
Date: Sat, 31 Jan 2026 17:22:00 GMT
Title: Safety-Efficacy Trade Off: Robustness against Data-Poisoning
Authors: Diego Granziol,
Abstract summary: Backdoor and data poisoning attacks can achieve high attack success while evading existing spectral and optimisation based defences.<n>We show that this behaviour is not incidental, but arises from a fundamental geometric mechanism in input space.<n>Our results establish when backdoors are inherently invisible, and provide the first end to end characterisation of poisoning, detectability, and defence through input space curvature.
Score: 2.273510537992342
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Backdoor and data poisoning attacks can achieve high attack success while evading existing spectral and optimisation based defences. We show that this behaviour is not incidental, but arises from a fundamental geometric mechanism in input space. Using kernel ridge regression as an exact model of wide neural networks, we prove that clustered dirty label poisons induce a rank one spike in the input Hessian whose magnitude scales quadratically with attack efficacy. Crucially, for nonlinear kernels we identify a near clone regime in which poison efficacy remains order one while the induced input curvature vanishes, making the attack provably spectrally undetectable. We further show that input gradient regularisation contracts poison aligned Fisher and Hessian eigenmodes under gradient flow, yielding an explicit and unavoidable safety efficacy trade off by reducing data fitting capacity. For exponential kernels, this defence admits a precise interpretation as an anisotropic high pass filter that increases the effective length scale and suppresses near clone poisons. Extensive experiments on linear models and deep convolutional networks across MNIST and CIFAR 10 and CIFAR 100 validate the theory, demonstrating consistent lags between attack success and spectral visibility, and showing that regularisation and data augmentation jointly suppress poisoning. Our results establish when backdoors are inherently invisible, and provide the first end to end characterisation of poisoning, detectability, and defence through input space curvature.

Related papers

BadCLIP++: Stealthy and Persistent Backdoors in Multimodal Contrastive Learning [73.46118996284888]
Research on backdoor attacks against multimodal contrastive learning models faces two key challenges: stealthiness and persistence.<n>We propose BadCLIP++, a unified framework that tackles both challenges.<n>For stealthiness, we introduce a semantic-fusion QR micro-trigger that embeds imperceptible patterns near task-relevant regions.<n>For persistence, we stabilize trigger embeddings via radius shrinkage and centroid alignment.
arXiv Detail & Related papers (2026-02-19T08:31:16Z)
The Eminence in Shadow: Exploiting Feature Boundary Ambiguity for Robust Backdoor Attacks [51.468144272905135]
Deep neural networks (DNNs) underpin critical applications yet remain vulnerable to backdoor attacks.<n>We provide a theoretical analysis targeting backdoor attacks, focusing on how sparse decision boundaries enable disproportionate model manipulation.<n>We propose Eminence, an explainable and robust black-box backdoor framework with provable theoretical guarantees and inherent stealth properties.
arXiv Detail & Related papers (2025-12-11T08:09:07Z)
Boosting Graph Robustness Against Backdoor Attacks: An Over-Similarity Perspective [11.671718919130099]
Graph Neural Networks (GNNs) have achieved notable success in tasks such as social and transportation networks.<n>Recent studies have highlighted the vulnerability of GNNs to backdoor attacks, raising significant concerns about their reliability in real-world applications.<n>We propose a novel graph backdoor defense method SimGuard.
arXiv Detail & Related papers (2025-02-03T11:41:42Z)
PoisonCatcher: Revealing and Identifying LDP Poisoning Attacks in IIoT [13.68394346583211]
Local Differential Privacy (LDP) is widely adopted in the Industrial Internet of Things (IIoT) due to its lightweight, decentralized, and scalable.<n>This work proposes a LDP poisoning defense for IIoT in the resource-rich aggregator.
arXiv Detail & Related papers (2024-12-20T09:26:50Z)
CopyrightShield: Enhancing Diffusion Model Security against Copyright Infringement Attacks [61.06621533874629]
Diffusion models are vulnerable to copyright infringement attacks, where attackers inject strategically modified non-infringing images into the training set.<n>We first propose a defense framework, CopyrightShield, to defend against the above attack.<n> Experimental results demonstrate that CopyrightShield significantly improves poisoned sample detection performance across two attack scenarios.
arXiv Detail & Related papers (2024-12-02T14:19:44Z)
Robustness Inspired Graph Backdoor Defense [30.82433380830665]
Graph Neural Networks (GNNs) have achieved promising results in tasks such as node classification and graph classification.<n>Recent studies reveal that GNNs are vulnerable to backdoor attacks, posing a significant threat to their real-world adoption.<n>We propose using random edge dropping to detect backdoors and theoretically show that it can efficiently distinguish poisoned nodes from clean ones.
arXiv Detail & Related papers (2024-06-14T08:46:26Z)
DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification [63.65630243675792]
Diffusion-based purification defenses leverage diffusion models to remove crafted perturbations of adversarial examples. Recent studies show that even advanced attacks cannot break such defenses effectively. We propose a unified framework DiffAttack to perform effective and efficient attacks against diffusion-based purification defenses.
arXiv Detail & Related papers (2023-10-27T15:17:50Z)
On Practical Aspects of Aggregation Defenses against Data Poisoning Attacks [58.718697580177356]
Attacks on deep learning models with malicious training samples are known as data poisoning. Recent advances in defense strategies against data poisoning have highlighted the effectiveness of aggregation schemes in achieving certified poisoning robustness. Here we focus on Deep Partition Aggregation, a representative aggregation defense, and assess its practical aspects, including efficiency, performance, and robustness.
arXiv Detail & Related papers (2023-06-28T17:59:35Z)
PiDAn: A Coherence Optimization Approach for Backdoor Attack Detection and Mitigation in Deep Neural Networks [22.900501880865658]
Backdoor attacks impose a new threat in Deep Neural Networks (DNNs) We propose PiDAn, an algorithm based on coherence optimization purifying the poisoned data. Our PiDAn algorithm can detect more than 90% infected classes and identify 95% poisoned samples.
arXiv Detail & Related papers (2022-03-17T12:37:21Z)
Indiscriminate Poisoning Attacks Are Shortcuts [77.38947817228656]
We find that the perturbations of advanced poisoning attacks are almost textbflinear separable when assigned with the target labels of the corresponding samples. We show that such synthetic perturbations are as powerful as the deliberately crafted attacks. Our finding suggests that the emphshortcut learning problem is more serious than previously believed.
arXiv Detail & Related papers (2021-11-01T12:44:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.