Evaluating the Evaluators: Trust in Adversarial Robustness Tests
- URL: http://arxiv.org/abs/2507.03450v1
- Date: Fri, 04 Jul 2025 10:07:26 GMT
- Title: Evaluating the Evaluators: Trust in Adversarial Robustness Tests
- Authors: Antonio Emanuele CinĂ , Maura Pintor, Luca Demetrio, Ambra Demontis, Battista Biggio, Fabio Roli,
- Abstract summary: AttackBench is an evaluation tool that ranks existing attack implementations based on a novel optimality metric.<n>The framework enforces consistent testing conditions and enables continuous updates, making it a reliable foundation for robustness verification.
- Score: 17.06660302788049
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite significant progress in designing powerful adversarial evasion attacks for robustness verification, the evaluation of these methods often remains inconsistent and unreliable. Many assessments rely on mismatched models, unverified implementations, and uneven computational budgets, which can lead to biased results and a false sense of security. Consequently, robustness claims built on such flawed testing protocols may be misleading and give a false sense of security. As a concrete step toward improving evaluation reliability, we present AttackBench, a benchmark framework developed to assess the effectiveness of gradient-based attacks under standardized and reproducible conditions. AttackBench serves as an evaluation tool that ranks existing attack implementations based on a novel optimality metric, which enables researchers and practitioners to identify the most reliable and effective attack for use in subsequent robustness evaluations. The framework enforces consistent testing conditions and enables continuous updates, making it a reliable foundation for robustness verification.
Related papers
- Aurora: Are Android Malware Classifiers Reliable and Stable under Distribution Shift? [51.12297424766236]
AURORA is a framework to evaluate malware classifiers based on their confidence quality and operational resilience.<n>AURORA is complemented by a set of metrics designed to go beyond point-in-time performance.<n>The fragility in SOTA frameworks across datasets of varying drift suggests the need for a return to the whiteboard.
arXiv Detail & Related papers (2025-05-28T20:22:43Z) - Advancing Embodied Agent Security: From Safety Benchmarks to Input Moderation [52.83870601473094]
Embodied agents exhibit immense potential across a multitude of domains.<n>Existing research predominantly concentrates on the security of general large language models.<n>This paper introduces a novel input moderation framework, meticulously designed to safeguard embodied agents.
arXiv Detail & Related papers (2025-04-22T08:34:35Z) - A New Framework of Software Obfuscation Evaluation Criteria [3.0567294793102784]
Several criteria have been proposed in the past to assess the strength of protections, such as potency, resilience, stealth, and cost.<n>We present a new framework of software protection evaluation criteria: relevance, effectiveness (or efficacy), robustness, concealment, stubbornness, sensitivity, predictability, and cost.
arXiv Detail & Related papers (2025-02-19T20:45:47Z) - Revisiting Confidence Estimation: Towards Reliable Failure Prediction [53.79160907725975]
We find a general, widely existing but actually-neglected phenomenon that most confidence estimation methods are harmful for detecting misclassification errors.
We propose to enlarge the confidence gap by finding flat minima, which yields state-of-the-art failure prediction performance.
arXiv Detail & Related papers (2024-03-05T11:44:14Z) - Dual-Channel Reliable Breast Ultrasound Image Classification Based on
Explainable Attribution and Uncertainty Quantification [4.868832755218741]
This paper focuses on the classification task of breast ultrasound images.
We propose a dual-channel evaluation framework based on the proposed inference reliability and predictive reliability scores.
arXiv Detail & Related papers (2024-01-08T04:37:18Z) - From Adversarial Arms Race to Model-centric Evaluation: Motivating a
Unified Automatic Robustness Evaluation Framework [91.94389491920309]
Textual adversarial attacks can discover models' weaknesses by adding semantic-preserved but misleading perturbations to the inputs.
The existing practice of robustness evaluation may exhibit issues of incomprehensive evaluation, impractical evaluation protocol, and invalid adversarial samples.
We set up a unified automatic robustness evaluation framework, shifting towards model-centric evaluation to exploit the advantages of adversarial attacks.
arXiv Detail & Related papers (2023-05-29T14:55:20Z) - Increasing Confidence in Adversarial Robustness Evaluations [53.2174171468716]
We propose a test to identify weak attacks and thus weak defense evaluations.
Our test slightly modifies a neural network to guarantee the existence of an adversarial example for every sample.
For eleven out of thirteen previously-published defenses, the original evaluation of the defense fails our test, while stronger attacks that break these defenses pass it.
arXiv Detail & Related papers (2022-06-28T13:28:13Z) - PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures [65.36234499099294]
We propose a new data augmentation strategy utilizing the natural structural complexity of pictures such as fractals.
To meet this challenge, we design a new data augmentation strategy utilizing the natural structural complexity of pictures such as fractals.
arXiv Detail & Related papers (2021-12-09T18:59:31Z) - Indicators of Attack Failure: Debugging and Improving Optimization of
Adversarial Examples [29.385242714424624]
evaluating robustness of machine-learning models to adversarial examples is a challenging problem.
We define a set of quantitative indicators which unveil common failures in the optimization of gradient-based attacks.
Our experimental analysis shows that the proposed indicators of failure can be used to visualize, debug and improve current adversarial robustness evaluations.
arXiv Detail & Related papers (2021-06-18T06:57:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.