Related papers: Robustness Certificates for Neural Networks against Adversarial Attacks

Robustness Certificates for Neural Networks against Adversarial Attacks

URL: http://arxiv.org/abs/2512.20865v1
Date: Wed, 24 Dec 2025 00:49:47 GMT
Title: Robustness Certificates for Neural Networks against Adversarial Attacks
Authors: Sara Taheri, Mahalakshmi Sabanayagam, Debarghya Ghoshdastidar, Majid Zamani,
Abstract summary: This paper introduces a principled formal robustness certification framework that models gradient-based training as a discrete-time dynamical system.<n>Our framework also extends to certification against test-time attacks, making it the first unified framework to provide formal guarantees in both training and test-time attack settings.
Score: 9.365069861121944
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The increasing use of machine learning in safety-critical domains amplifies the risk of adversarial threats, especially data poisoning attacks that corrupt training data to degrade performance or induce unsafe behavior. Most existing defenses lack formal guarantees or rely on restrictive assumptions about the model class, attack type, extent of poisoning, or point-wise certification, limiting their practical reliability. This paper introduces a principled formal robustness certification framework that models gradient-based training as a discrete-time dynamical system (dt-DS) and formulates poisoning robustness as a formal safety verification problem. By adapting the concept of barrier certificates (BCs) from control theory, we introduce sufficient conditions to certify a robust radius ensuring that the terminal model remains safe under worst-case ${\ell}_p$-norm based poisoning. To make this practical, we parameterize BCs as neural networks trained on finite sets of poisoned trajectories. We further derive probably approximately correct (PAC) bounds by solving a scenario convex program (SCP), which yields a confidence lower bound on the certified robustness radius generalizing beyond the training set. Importantly, our framework also extends to certification against test-time attacks, making it the first unified framework to provide formal guarantees in both training and test-time attack settings. Experiments on MNIST, SVHN, and CIFAR-10 show that our approach certifies non-trivial perturbation budgets while being model-agnostic and requiring no prior knowledge of the attack or contamination level.

Related papers

Large Reasoning Models Learn Better Alignment from Flawed Thinking [56.08883934423522]
Large reasoning models (LRMs) "think" by generating structured chain-of-thought (CoT) before producing a final answer.<n>We propose RECAP, a principled reinforcement learning (RL) method for post-training that explicitly teaches models to override flawed reasoning trajectories.
arXiv Detail & Related papers (2025-10-01T14:15:43Z)
Distributionally Robust Safety Verification of Neural Networks via Worst-Case CVaR [3.0458514384586404]
This paper builds on Fazlyab's quadratic-constraint (QC) and semidefinite-programming (SDP) framework for neural network verification.<n>The integration broadens input-uncertainty geometry-covering ellipsoids, polytopes, and hyperplanes-and extends applicability to safety-critical domains.
arXiv Detail & Related papers (2025-09-22T07:04:53Z)
Preliminary Investigation into Uncertainty-Aware Attack Stage Classification [81.28215542218724]
This work addresses the problem of attack stage inference under uncertainty.<n>We propose a classification approach based on Evidential Deep Learning (EDL), which models predictive uncertainty by outputting parameters of a Dirichlet distribution over possible stages.<n>Preliminary experiments in a simulated environment demonstrate that the proposed model can accurately infer the stage of an attack with confidence.
arXiv Detail & Related papers (2025-08-01T06:58:00Z)
FullCert: Deterministic End-to-End Certification for Training and Inference of Neural Networks [62.897993591443594]
FullCert is the first end-to-end certifier with sound, deterministic bounds. We experimentally demonstrate FullCert's feasibility on two datasets.
arXiv Detail & Related papers (2024-06-17T13:23:52Z)
The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks [90.52808174102157]
In safety-critical applications such as medical imaging and autonomous driving, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks. A notable knowledge gap remains concerning the uncertainty inherent in adversarially trained models. This study investigates the uncertainty of deep learning models by examining the performance of conformal prediction (CP) in the context of standard adversarial attacks.
arXiv Detail & Related papers (2024-05-14T18:05:19Z)
Safe Online Dynamics Learning with Initially Unknown Models and Infeasible Safety Certificates [45.72598064481916]
This paper considers a learning-based setting with a robust safety certificate based on a control barrier function (CBF) second-order cone program. If the control barrier function certificate is feasible, our approach leverages it to guarantee safety. Otherwise, our method explores the system dynamics to collect data and recover the feasibility of the control barrier function constraint.
arXiv Detail & Related papers (2023-11-03T14:23:57Z)
FI-ODE: Certifiably Robust Forward Invariance in Neural ODEs [34.762005448725226]
We propose a general framework for training and provably certifying robust forward invariance in Neural ODEs. We apply this framework to provide certified safety in robust continuous control. In addition, we explore the generality of our framework by using it to certify adversarial robustness for image classification.
arXiv Detail & Related papers (2022-10-30T20:30:19Z)
Certifiers Make Neural Networks Vulnerable to Availability Attacks [70.69104148250614]
We show for the first time that fallback strategies can be deliberately triggered by an adversary. In addition to naturally occurring abstains for some inputs and perturbations, the adversary can use training-time attacks to deliberately trigger the fallback. We design two novel availability attacks, which show the practical relevance of these threats.
arXiv Detail & Related papers (2021-08-25T15:49:10Z)
Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs. We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z)
Bayesian Inference with Certifiable Adversarial Robustness [25.40092314648194]
We consider adversarial training networks through the lens of Bayesian learning. We present a principled framework for adversarial training of Bayesian Neural Networks (BNNs) with certifiable guarantees. Our method is the first to directly train certifiable BNNs, thus facilitating their use in safety-critical applications.
arXiv Detail & Related papers (2021-02-10T07:17:49Z)
How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality. We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers. Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.