Related papers: FullCert: Deterministic End-to-End Certification for Training and Inference of Neural Networks

FullCert: Deterministic End-to-End Certification for Training and Inference of Neural Networks

URL: http://arxiv.org/abs/2406.11522v2
Date: Wed, 11 Sep 2024 12:00:30 GMT
Title: FullCert: Deterministic End-to-End Certification for Training and Inference of Neural Networks
Authors: Tobias Lorenz, Marta Kwiatkowska, Mario Fritz,
Abstract summary: FullCert is the first end-to-end certifier with sound, deterministic bounds. We experimentally demonstrate FullCert's feasibility on two datasets.
Score: 62.897993591443594
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern machine learning models are sensitive to the manipulation of both the training data (poisoning attacks) and inference data (adversarial examples). Recognizing this issue, the community has developed many empirical defenses against both attacks and, more recently, certification methods with provable guarantees against inference-time attacks. However, such guarantees are still largely lacking for training-time attacks. In this work, we present FullCert, the first end-to-end certifier with sound, deterministic bounds, which proves robustness against both training-time and inference-time attacks. We first bound all possible perturbations an adversary can make to the training data under the considered threat model. Using these constraints, we bound the perturbations' influence on the model's parameters. Finally, we bound the impact of these parameter changes on the model's prediction, resulting in joint robustness guarantees against poisoning and adversarial examples. To facilitate this novel certification paradigm, we combine our theoretical work with a new open-source library BoundFlow, which enables model training on bounded datasets. We experimentally demonstrate FullCert's feasibility on two datasets.

Related papers

Exact Certification of Data-Poisoning Attacks Using Mixed-Integer Programming [2.5526950745993013]
We formulate adversarial data manipulation, model training, and test-time evaluation in a single mixed-integer quadratic programming (MIQCP) problem.<n>Finding the global optimum of the proposed formulation provably yields worst-case poisoning attacks.<n>Our framework encodes both the gradient-based training dynamics and model evaluation at test time, enabling the first exact certification of training-time robustness.
arXiv Detail & Related papers (2026-02-18T23:18:45Z)
Abstract Gradient Training: A Unified Certification Framework for Data Poisoning, Unlearning, and Differential Privacy [7.246481649624287]
This work introduces Abstract Gradient Training (AGT), a unified framework for certifying robustness of a given model and training procedure to training data perturbations.<n>AGT provides a formal approach to analyzing the behavior of models trained via first-order optimization methods.
arXiv Detail & Related papers (2025-11-12T15:15:15Z)
BiCert: A Bilinear Mixed Integer Programming Formulation for Precise Certified Bounds Against Data Poisoning Attacks [62.897993591443594]
Data poisoning attacks pose one of the biggest threats to modern AI systems. Data poisoning attacks pose one of the biggest threats to modern AI systems. Data poisoning attacks pose one of the biggest threats to modern AI systems.
arXiv Detail & Related papers (2024-12-13T14:56:39Z)
Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks. We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z)
Certified Robustness to Data Poisoning in Gradient-Based Training [10.79739918021407]
We develop the first framework providing provable guarantees on the behavior of models trained with potentially manipulated data. Our framework certifies robustness against untargeted and targeted poisoning, as well as backdoor attacks. We demonstrate our approach on multiple real-world datasets from applications including energy consumption, medical imaging, and autonomous driving.
arXiv Detail & Related papers (2024-06-09T06:59:46Z)
Robustness-Congruent Adversarial Training for Secure Machine Learning Model Updates [13.911586916369108]
We show that misclassifications in machine-learning models can affect robustness to adversarial examples. We propose a technique, named robustness-congruent adversarial training, to address this issue. We show that our algorithm and, more generally, learning with non-regression constraints, provides a theoretically-grounded framework to train consistent estimators.
arXiv Detail & Related papers (2024-02-27T10:37:13Z)
Data Poisoning Attack Aiming the Vulnerability of Continual Learning [25.480762565632332]
We present a simple task-specific data poisoning attack that can be used in the learning process of a new task. We experiment with the attack on the two representative regularization-based continual learning methods.
arXiv Detail & Related papers (2022-11-29T02:28:05Z)
FLIP: A Provable Defense Framework for Backdoor Mitigation in Federated Learning [66.56240101249803]
We study how hardening benign clients can affect the global model (and the malicious clients) We propose a trigger reverse engineering based defense and show that our method can achieve improvement with guarantee robustness. Our results on eight competing SOTA defense methods show the empirical superiority of our method on both single-shot and continuous FL backdoor attacks.
arXiv Detail & Related papers (2022-10-23T22:24:03Z)
Learning and Certification under Instance-targeted Poisoning [49.55596073963654]
We study PAC learnability and certification under instance-targeted poisoning attacks. We show that when the budget of the adversary scales sublinearly with the sample complexity, PAC learnability and certification are achievable. We empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets.
arXiv Detail & Related papers (2021-05-18T17:48:15Z)
Trust but Verify: Assigning Prediction Credibility by Counterfactual Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning. These measures should account for the wide variety of models used in practice. The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)
Asymptotic Behavior of Adversarial Training in Binary Classification [41.7567932118769]
Adversarial training is considered to be the state-of-the-art method for defense against adversarial attacks. Despite being successful in practice, several problems in understanding performance of adversarial training remain open. We derive precise theoretical predictions for the minimization of adversarial training in binary classification.
arXiv Detail & Related papers (2020-10-26T01:44:20Z)
Sampling Attacks: Amplification of Membership Inference Attacks by Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model. We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance. For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.