Average Certified Radius is a Poor Metric for Randomized Smoothing
- URL: http://arxiv.org/abs/2410.06895v1
- Date: Wed, 9 Oct 2024 13:58:41 GMT
- Title: Average Certified Radius is a Poor Metric for Randomized Smoothing
- Authors: Chenhao Sun, Yuhao Mao, Mark Niklas Müller, Martin Vechev,
- Abstract summary: We show that the average certified radius (ACR) is an exceptionally poor metric for evaluating robustness guarantees provided by randomized smoothing.
We show that ACR is much more sensitive to improvements on easy samples than on hard ones.
- Score: 7.960121888896864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Randomized smoothing is a popular approach for providing certified robustness guarantees against adversarial attacks, and has become a very active area of research. Over the past years, the average certified radius (ACR) has emerged as the single most important metric for comparing methods and tracking progress in the field. However, in this work, we show that ACR is an exceptionally poor metric for evaluating robustness guarantees provided by randomized smoothing. We theoretically show not only that a trivial classifier can have arbitrarily large ACR, but also that ACR is much more sensitive to improvements on easy samples than on hard ones. Empirically, we confirm that existing training strategies that improve ACR reduce the model's robustness on hard samples. Further, we show that by focusing on easy samples, we can effectively replicate the increase in ACR. We develop strategies, including explicitly discarding hard samples, reweighing the dataset with certified radius, and extreme optimization for easy samples, to achieve state-of-the-art ACR, although these strategies ignore robustness for the general data distribution. Overall, our results suggest that ACR has introduced a strong undesired bias to the field, and better metrics are required to holistically evaluate randomized smoothing.
Related papers
- Controllable RANSAC-based Anomaly Detection via Hypothesis Testing [7.10052009802944]
We propose a novel statistical method for testing the anomaly detection results obtained by RANSAC (controllable RANSAC)
The key strength of the proposed method lies in its ability to control the probability of misidentifying anomalies below a pre-specified level.
Experiments conducted on synthetic and real-world datasets robustly support our theoretical results.
arXiv Detail & Related papers (2024-10-19T15:15:41Z) - The Vital Role of Gradient Clipping in Byzantine-Resilient Distributed Learning [8.268485501864939]
Byzantine-resilient distributed machine learning seeks to achieve robust learning performance in the presence of misbehaving or adversarial workers.
While state-of-the-art (SOTA) robust distributed gradient descent (DGD) methods were proven theoretically optimal, their empirical success has often relied on pre-aggregation gradient clipping.
We propose a principled adaptive clipping strategy, termed Adaptive Robust ClippingARC, to improve robustness against some attacks while being ineffective or detrimental against others.
arXiv Detail & Related papers (2024-05-23T11:00:31Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Robust Generalization against Photon-Limited Corruptions via Worst-Case
Sharpness Minimization [89.92932924515324]
Robust generalization aims to tackle the most challenging data distributions which are rare in the training set and contain severe noises.
Common solutions such as distributionally robust optimization (DRO) focus on the worst-case empirical risk to ensure low training error.
We propose SharpDRO by penalizing the sharpness of the worst-case distribution, which measures the loss changes around the neighbor of learning parameters.
We show that SharpDRO exhibits a strong generalization ability against severe corruptions and exceeds well-known baseline methods with large performance gains.
arXiv Detail & Related papers (2023-03-23T07:58:48Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Input-Specific Robustness Certification for Randomized Smoothing [76.76115360719837]
We propose Input-Specific Sampling (ISS) acceleration to achieve the cost-effectiveness for robustness certification.
ISS can speed up the certification by more than three times at a limited cost of 0.05 certified radius.
arXiv Detail & Related papers (2021-12-21T12:16:03Z) - Generalized Real-World Super-Resolution through Adversarial Robustness [107.02188934602802]
We present Robust Super-Resolution, a method that leverages the generalization capability of adversarial attacks to tackle real-world SR.
Our novel framework poses a paradigm shift in the development of real-world SR methods.
By using a single robust model, we outperform state-of-the-art specialized methods on real-world benchmarks.
arXiv Detail & Related papers (2021-08-25T22:43:20Z) - Boosting Randomized Smoothing with Variance Reduced Classifiers [4.110108749051657]
We motivate why ensembles are a particularly suitable choice as base models for Randomized Smoothing (RS)
We empirically confirm this choice, obtaining state of the art results in multiple settings.
arXiv Detail & Related papers (2021-06-13T08:40:27Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.