Concept-Based Masking: A Patch-Agnostic Defense Against Adversarial Patch Attacks
- URL: http://arxiv.org/abs/2510.04245v1
- Date: Sun, 05 Oct 2025 15:26:03 GMT
- Title: Concept-Based Masking: A Patch-Agnostic Defense Against Adversarial Patch Attacks
- Authors: Ayushi Mehrotra, Derek Peng, Dipkamal Bhusal, Nidhi Rastogi,
- Abstract summary: Adrial patch attacks pose a practical threat to deep learning models.<n>We propose a patch-agnostic defense that leverages concept-based explanations to identify and suppress the most influential concept activation vectors.
- Score: 2.449909275410288
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial patch attacks pose a practical threat to deep learning models by forcing targeted misclassifications through localized perturbations, often realized in the physical world. Existing defenses typically assume prior knowledge of patch size or location, limiting their applicability. In this work, we propose a patch-agnostic defense that leverages concept-based explanations to identify and suppress the most influential concept activation vectors, thereby neutralizing patch effects without explicit detection. Evaluated on Imagenette with a ResNet-50, our method achieves higher robust and clean accuracy than the state-of-the-art PatchCleanser, while maintaining strong performance across varying patch sizes and locations. Our results highlight the promise of combining interpretability with robustness and suggest concept-driven defenses as a scalable strategy for securing machine learning models against adversarial patch attacks.
Related papers
- Towards Robust Semantic Segmentation against Patch-based Attack via Attention Refinement [68.31147013783387]
We observe that the attention mechanism is vulnerable to patch-based adversarial attacks.
In this paper, we propose a Robust Attention Mechanism (RAM) to improve the robustness of the semantic segmentation model.
arXiv Detail & Related papers (2024-01-03T13:58:35Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Task-agnostic Defense against Adversarial Patch Attacks [25.15948648034204]
Adversarial patch attacks mislead neural networks by injecting adversarial pixels within a designated local region.
We present PatchZero, a task-agnostic defense against white-box adversarial patches.
Our method achieves SOTA robust accuracy without any degradation in the benign performance.
arXiv Detail & Related papers (2022-07-05T03:49:08Z) - Defending Object Detectors against Patch Attacks with Out-of-Distribution Smoothing [21.174037250133622]
We introduce OODSmoother, which characterizes the properties of approaches that aim to remove adversarial patches.<n>This framework naturally guides us to design 1) a novel adaptive attack that breaks existing patch attack defenses on object detectors, and 2) a novel defense approach SemPrior that takes advantage of semantic priors.<n>We find that SemPrior alone provides up to a 40% gain, or up to a 60% gain when combined with existing defenses.
arXiv Detail & Related papers (2022-05-18T15:20:18Z) - Segment and Complete: Defending Object Detectors against Adversarial
Patch Attacks with Robust Patch Detection [142.24869736769432]
Adversarial patch attacks pose a serious threat to state-of-the-art object detectors.
We propose Segment and Complete defense (SAC), a framework for defending object detectors against patch attacks.
We show SAC can significantly reduce the targeted attack success rate of physical patch attacks.
arXiv Detail & Related papers (2021-12-08T19:18:48Z) - Robustness Out of the Box: Compositional Representations Naturally
Defend Against Black-Box Patch Attacks [11.429509031463892]
Patch-based adversarial attacks introduce a perceptible but localized change to the input that induces misclassification.
In this work, we study two different approaches for defending against black-box patch attacks.
We find that adversarial training has limited effectiveness against state-of-the-art location-optimized patch attacks.
arXiv Detail & Related papers (2020-12-01T15:04:23Z) - PatchGuard: A Provably Robust Defense against Adversarial Patches via
Small Receptive Fields and Masking [46.03749650789915]
Localized adversarial patches aim to induce misclassification in machine learning models by arbitrarily modifying pixels within a restricted region of an image.
We propose a general defense framework called PatchGuard that can achieve high provable robustness while maintaining high clean accuracy against localized adversarial patches.
arXiv Detail & Related papers (2020-05-17T03:38:34Z) - Adversarial Training against Location-Optimized Adversarial Patches [84.96938953835249]
adversarial patches: clearly visible, but adversarially crafted rectangular patches in images.
We first devise a practical approach to obtain adversarial patches while actively optimizing their location within the image.
We apply adversarial training on these location-optimized adversarial patches and demonstrate significantly improved robustness on CIFAR10 and GTSRB.
arXiv Detail & Related papers (2020-05-05T16:17:00Z) - Certified Defenses for Adversarial Patches [72.65524549598126]
Adversarial patch attacks are among the most practical threat models against real-world computer vision systems.
This paper studies certified and empirical defenses against patch attacks.
arXiv Detail & Related papers (2020-03-14T19:57:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.