Related papers: Certified but Fooled! Breaking Certified Defences with Ghost Certificates

Certified but Fooled! Breaking Certified Defences with Ghost Certificates

URL: http://arxiv.org/abs/2511.14003v1
Date: Tue, 18 Nov 2025 00:11:24 GMT
Title: Certified but Fooled! Breaking Certified Defences with Ghost Certificates
Authors: Quoc Viet Vo, Tashreque M. Haq, Paul Montague, Tamas Abraham, Ehsan Abbasnejad, Damith C. Ranasinghe,
Abstract summary: We study the malicious exploitation of probabilistic certification frameworks to understand the limits of guarantee provisions.<n>A recent study in ICLR demonstrated that crafting large perturbations can shift inputs far into regions capable of generating a certificate for an incorrect class.<n>We explore the idea of region-focused adversarial examples to craft imperceptible perturbations, spoof certificates and achieve certification radii larger than the source class ghost certificates.
Score: 24.77320354796342
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Certified defenses promise provable robustness guarantees. We study the malicious exploitation of probabilistic certification frameworks to better understand the limits of guarantee provisions. Now, the objective is to not only mislead a classifier, but also manipulate the certification process to generate a robustness guarantee for an adversarial input certificate spoofing. A recent study in ICLR demonstrated that crafting large perturbations can shift inputs far into regions capable of generating a certificate for an incorrect class. Our study investigates if perturbations needed to cause a misclassification and yet coax a certified model into issuing a deceptive, large robustness radius for a target class can still be made small and imperceptible. We explore the idea of region-focused adversarial examples to craft imperceptible perturbations, spoof certificates and achieve certification radii larger than the source class ghost certificates. Extensive evaluations with the ImageNet demonstrate the ability to effectively bypass state-of-the-art certified defenses such as Densepure. Our work underscores the need to better understand the limits of robustness certification methods.

Related papers

CrossCert: A Cross-Checking Detection Approach to Patch Robustness Certification for Deep Learning Models [6.129515045488372]
Patch robustness certification is an emerging kind of defense technique against adversarial patch attacks with provable guarantees. This paper proposes a novel certified defense technique called CrossCert.
arXiv Detail & Related papers (2024-05-13T11:54:03Z)
Adaptive Hierarchical Certification for Segmentation using Randomized Smoothing [87.48628403354351]
certification for machine learning is proving that no adversarial sample can evade a model within a range under certain conditions. Common certification methods for segmentation use a flat set of fine-grained classes, leading to high abstain rates due to model uncertainty. We propose a novel, more practical setting, which certifies pixels within a multi-level hierarchy, and adaptively relaxes the certification to a coarser level for unstable components.
arXiv Detail & Related papers (2024-02-13T11:59:43Z)
Et Tu Certifications: Robustness Certificates Yield Better Adversarial Examples [30.42301446202426]
Our new emphCertification Aware Attack exploits certifications to produce computationally efficient norm-minimising adversarial examples. While these attacks can be used to assess the tightness of certification bounds, they also highlight that releasing certifications can paradoxically reduce security.
arXiv Detail & Related papers (2023-02-09T00:10:05Z)
Certified Interpretability Robustness for Class Activation Mapping [77.58769591550225]
We present CORGI, short for Certifiably prOvable Robustness Guarantees for Interpretability mapping. CORGI is an algorithm that takes in an input image and gives a certifiable lower bound for the robustness of its CAM interpretability map. We show the effectiveness of CORGI via a case study on traffic sign data, certifying lower bounds on the minimum adversarial perturbation.
arXiv Detail & Related papers (2023-01-26T18:58:11Z)
Double Bubble, Toil and Trouble: Enhancing Certified Robustness through Transitivity [27.04033198073254]
In response to subtle adversarial examples flipping classifications of neural network models, recent research has promoted certified robustness as a solution. We show how today's "optimal" certificates can be improved by exploiting both the transitivity of certifications, and the geometry of the input space. Our technique shows even more promising results, with a uniform $4$ percentage point increase in the achieved certified radius.
arXiv Detail & Related papers (2022-10-12T10:42:21Z)
Towards Evading the Limits of Randomized Smoothing: A Theoretical Analysis [74.85187027051879]
We show that it is possible to approximate the optimal certificate with arbitrary precision, by probing the decision boundary with several noise distributions. This result fosters further research on classifier-specific certification and demonstrates that randomized smoothing is still worth investigating.
arXiv Detail & Related papers (2022-06-03T17:48:54Z)
COPA: Certifying Robust Policies for Offline Reinforcement Learning against Poisoning Attacks [49.15885037760725]
We focus on certifying the robustness of offline reinforcement learning (RL) in the presence of poisoning attacks. We propose the first certification framework, COPA, to certify the number of poisoning trajectories that can be tolerated. We prove that some of the proposed certification methods are theoretically tight and some are NP-Complete problems.
arXiv Detail & Related papers (2022-03-16T05:02:47Z)
Certifiers Make Neural Networks Vulnerable to Availability Attacks [70.69104148250614]
We show for the first time that fallback strategies can be deliberately triggered by an adversary. In addition to naturally occurring abstains for some inputs and perturbations, the adversary can use training-time attacks to deliberately trigger the fallback. We design two novel availability attacks, which show the practical relevance of these threats.
arXiv Detail & Related papers (2021-08-25T15:49:10Z)
Certified Distributional Robustness on Smoothed Classifiers [27.006844966157317]
We propose the worst-case adversarial loss over input distributions as a robustness certificate. By exploiting duality and the smoothness property, we provide an easy-to-compute upper bound as a surrogate for the certificate.
arXiv Detail & Related papers (2020-10-21T13:22:25Z)
Breaking certified defenses: Semantic adversarial examples with spoofed robustness certificates [57.52763961195292]
We present a new attack that exploits not only the labelling function of a classifier, but also the certificate generator. The proposed method applies large perturbations that place images far from a class boundary while maintaining the imperceptibility property of adversarial examples.
arXiv Detail & Related papers (2020-03-19T17:59:44Z)
(De)Randomized Smoothing for Certifiable Defense against Patch Attacks [136.79415677706612]
We introduce a certifiable defense against patch attacks that guarantees for a given image and patch attack size. Our method is related to the broad class of randomized smoothing robustness schemes. Our results effectively establish a new state-of-the-art of certifiable defense against patch attacks on CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2020-02-25T08:39:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.