Learning Better Certified Models from Empirically-Robust Teachers
- URL: http://arxiv.org/abs/2602.02626v1
- Date: Mon, 02 Feb 2026 16:30:53 GMT
- Title: Learning Better Certified Models from Empirically-Robust Teachers
- Authors: Alessandro De Palma,
- Abstract summary: Adrial training attains strong empirical robustness to specific adversarial attacks by training on concrete adversarial perturbations.<n>It produces neural networks that are not amenable to strong robustness certificates through neural network verification.<n>Earlier certified training schemes directly train on bounds from network relaxations to obtain models that are certifiably robust, but display sub-par standard performance.
- Score: 53.51898477987441
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial training attains strong empirical robustness to specific adversarial attacks by training on concrete adversarial perturbations, but it produces neural networks that are not amenable to strong robustness certificates through neural network verification. On the other hand, earlier certified training schemes directly train on bounds from network relaxations to obtain models that are certifiably robust, but display sub-par standard performance. Recent work has shown that state-of-the-art trade-offs between certified robustness and standard performance can be obtained through a family of losses combining adversarial outputs and neural network bounds. Nevertheless, differently from empirical robustness, verifiability still comes at a significant cost in standard performance. In this work, we propose to leverage empirically-robust teachers to improve the performance of certifiably-robust models through knowledge distillation. Using a versatile feature-space distillation objective, we show that distillation from adversarially-trained teachers consistently improves on the state-of-the-art in certified training for ReLU networks across a series of robust computer vision benchmarks.
Related papers
- A Validation Strategy for Deep Learning Models: Evaluating and Enhancing Robustness [0.8532585403388676]
We propose a validation approach that extracts "weak robust" samples directly from the training dataset via local analysis.<n>These samples, being the most susceptible to perturbations, serve as an early and sensitive indicator of the model's vulnerabilities.<n>We demonstrate the effectiveness of our approach on models trained with CIFAR-10, CIFAR-100, and ImageNet.
arXiv Detail & Related papers (2025-09-23T16:14:14Z) - On Using Certified Training towards Empirical Robustness [40.582830117229854]
We show that a certified training algorithm can prevent catastrophic overfitting on single-step attacks.<n>We also present a conceptually simple regularizer for network over-approximations that can achieve similar effects while markedly reducing runtime.
arXiv Detail & Related papers (2024-10-02T14:56:21Z) - Mitigating Accuracy-Robustness Trade-off via Balanced Multi-Teacher Adversarial Distillation [12.39860047886679]
Adversarial Training is a practical approach for improving the robustness of deep neural networks against adversarial attacks.
We introduce Balanced Multi-Teacher Adversarial Robustness Distillation (B-MTARD) to guide the model's Adversarial Training process.
B-MTARD outperforms the state-of-the-art methods against various adversarial attacks.
arXiv Detail & Related papers (2023-06-28T12:47:01Z) - A Comprehensive Study on Robustness of Image Classification Models:
Benchmarking and Rethinking [54.89987482509155]
robustness of deep neural networks is usually lacking under adversarial examples, common corruptions, and distribution shifts.
We establish a comprehensive benchmark robustness called textbfARES-Bench on the image classification task.
By designing the training settings accordingly, we achieve the new state-of-the-art adversarial robustness.
arXiv Detail & Related papers (2023-02-28T04:26:20Z) - On the benefits of knowledge distillation for adversarial robustness [53.41196727255314]
We show that knowledge distillation can be used directly to boost the performance of state-of-the-art models in adversarial robustness.
We present Adversarial Knowledge Distillation (AKD), a new framework to improve a model's robust performance.
arXiv Detail & Related papers (2022-03-14T15:02:13Z) - Fast Training of Provably Robust Neural Networks by SingleProp [71.19423596238568]
We develop a new regularizer that is both more efficient than existing certified defenses.
We demonstrate improvements in training speed and comparable certified accuracy compared to state-of-the-art certified defenses.
arXiv Detail & Related papers (2021-02-01T22:12:51Z) - Certified Distributional Robustness on Smoothed Classifiers [27.006844966157317]
We propose the worst-case adversarial loss over input distributions as a robustness certificate.
By exploiting duality and the smoothness property, we provide an easy-to-compute upper bound as a surrogate for the certificate.
arXiv Detail & Related papers (2020-10-21T13:22:25Z) - Hybrid Discriminative-Generative Training via Contrastive Learning [96.56164427726203]
We show that through the perspective of hybrid discriminative-generative training of energy-based models we can make a direct connection between contrastive learning and supervised learning.
We show our specific choice of approximation of the energy-based loss outperforms the existing practice in terms of classification accuracy of WideResNet on CIFAR-10 and CIFAR-100.
arXiv Detail & Related papers (2020-07-17T15:50:34Z) - Rethinking Clustering for Robustness [56.14672993686335]
ClusTR is a clustering-based and adversary-free training framework to learn robust models.
textitClusTR outperforms adversarially-trained networks by up to $4%$ under strong PGD attacks.
arXiv Detail & Related papers (2020-06-13T16:55:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.