Detecting Brittle Decisions for Free: Leveraging Margin Consistency in Deep Robust Classifiers
- URL: http://arxiv.org/abs/2406.18451v3
- Date: Fri, 01 Nov 2024 02:13:59 GMT
- Title: Detecting Brittle Decisions for Free: Leveraging Margin Consistency in Deep Robust Classifiers
- Authors: Jonas Ngnawé, Sabyasachi Sahoo, Yann Pequignot, Frédéric Precioso, Christian Gagné,
- Abstract summary: Decision making in deep learning models can be sensitive to imperceptible perturbations.
evaluating a model's vulnerability at a per-instance level using adversarial attacks is computationally too intensive and unsuitable for real-time deployment scenarios.
This paper introduces the concept of margin consistency for efficient detection of vulnerable samples.
- Score: 9.147975682184528
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Despite extensive research on adversarial training strategies to improve robustness, the decisions of even the most robust deep learning models can still be quite sensitive to imperceptible perturbations, creating serious risks when deploying them for high-stakes real-world applications. While detecting such cases may be critical, evaluating a model's vulnerability at a per-instance level using adversarial attacks is computationally too intensive and unsuitable for real-time deployment scenarios. The input space margin is the exact score to detect non-robust samples and is intractable for deep neural networks. This paper introduces the concept of margin consistency -- a property that links the input space margins and the logit margins in robust models -- for efficient detection of vulnerable samples. First, we establish that margin consistency is a necessary and sufficient condition to use a model's logit margin as a score for identifying non-robust samples. Next, through comprehensive empirical analysis of various robustly trained models on CIFAR10 and CIFAR100 datasets, we show that they indicate high margin consistency with a strong correlation between their input space margins and the logit margins. Then, we show that we can effectively and confidently use the logit margin to detect brittle decisions with such models. Finally, we address cases where the model is not sufficiently margin-consistent by learning a pseudo-margin from the feature representation. Our findings highlight the potential of leveraging deep representations to assess adversarial vulnerability in deployment scenarios efficiently.
Related papers
- Rethinking Semi-supervised Segmentation Beyond Accuracy: Reliability and Robustness [10.220692937750295]
Reliable Score ( RSS) is a novel metric that combines predictive accuracy, calibration, and uncertainty quality measures via a harmonic mean.<n>We advocate for a shift in evaluation protocols toward more holistic metrics like RSS to better align semi-supervised learning research with real-world deployment needs.
arXiv Detail & Related papers (2025-06-06T09:37:45Z) - TrustLoRA: Low-Rank Adaptation for Failure Detection under Out-of-distribution Data [62.22804234013273]
We propose a simple failure detection framework to unify and facilitate classification with rejection under both covariate and semantic shifts.
Our key insight is that by separating and consolidating failure-specific reliability knowledge with low-rank adapters, we can enhance the failure detection ability effectively and flexibly.
arXiv Detail & Related papers (2025-04-20T09:20:55Z) - A Robust Adversarial Ensemble with Causal (Feature Interaction) Interpretations for Image Classification [9.945272787814941]
We present a deep ensemble model that combines discriminative features with generative models to achieve both high accuracy and adversarial robustness.
Our approach integrates a bottom-level pre-trained discriminative network for feature extraction with a top-level generative classification network that models adversarial input distributions.
arXiv Detail & Related papers (2024-12-28T05:06:20Z) - Learning from Noisy Labels via Conditional Distributionally Robust Optimization [5.85767711644773]
crowdsourcing has emerged as a practical solution for labeling large datasets.
It presents a significant challenge in learning accurate models due to noisy labels from annotators with varying levels of expertise.
arXiv Detail & Related papers (2024-11-26T05:03:26Z) - PUMA: margin-based data pruning [51.12154122266251]
We focus on data pruning, where some training samples are removed based on the distance to the model classification boundary (i.e., margin)
We propose PUMA, a new data pruning strategy that computes the margin using DeepFool.
We show that PUMA can be used on top of the current state-of-the-art methodology in robustness, and it is able to significantly improve the model performance unlike the existing data pruning strategies.
arXiv Detail & Related papers (2024-05-10T08:02:20Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Learning to Generate Training Datasets for Robust Semantic Segmentation [37.9308918593436]
We propose a novel approach to improve the robustness of semantic segmentation techniques.
We design Robusta, a novel conditional generative adversarial network to generate realistic and plausible perturbed images.
Our results suggest that this approach could be valuable in safety-critical applications.
arXiv Detail & Related papers (2023-08-01T10:02:26Z) - Characterizing Data Point Vulnerability via Average-Case Robustness [29.881355412540557]
adversarial robustness is a standard framework, which views robustness of predictions through a binary lens.
We consider a complementary framework for robustness, called average-case robustness, which measures the fraction of points in a local region.
We show empirically that our estimators are accurate and efficient for standard deep learning models.
arXiv Detail & Related papers (2023-07-26T01:10:29Z) - GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models [60.48306899271866]
We present a new framework, called GREAT Score, for global robustness evaluation of adversarial perturbation using generative models.
We show high correlation and significantly reduced cost of GREAT Score when compared to the attack-based model ranking on RobustBench.
GREAT Score can be used for remote auditing of privacy-sensitive black-box models.
arXiv Detail & Related papers (2023-04-19T14:58:27Z) - Distributionally Robust Model-Based Offline Reinforcement Learning with
Near-Optimal Sample Complexity [39.886149789339335]
offline reinforcement learning aims to learn to perform decision making from history data without active exploration.
Due to uncertainties and variabilities of the environment, it is critical to learn a robust policy that performs well even when the deployed environment deviates from the nominal one used to collect the history dataset.
We consider a distributionally robust formulation of offline RL, focusing on robust Markov decision processes with an uncertainty set specified by the Kullback-Leibler divergence in both finite-horizon and infinite-horizon settings.
arXiv Detail & Related papers (2022-08-11T11:55:31Z) - Residual Error: a New Performance Measure for Adversarial Robustness [85.0371352689919]
A major challenge that limits the wide-spread adoption of deep learning has been their fragility to adversarial attacks.
This study presents the concept of residual error, a new performance measure for assessing the adversarial robustness of a deep neural network.
Experimental results using the case of image classification demonstrate the effectiveness and efficacy of the proposed residual error metric.
arXiv Detail & Related papers (2021-06-18T16:34:23Z) - Closeness and Uncertainty Aware Adversarial Examples Detection in
Adversarial Machine Learning [0.7734726150561088]
We explore and assess the usage of 2 different groups of metrics in detecting adversarial samples.
We introduce a new feature for adversarial detection, and we show that the performances of all these metrics heavily depend on the strength of the attack being used.
arXiv Detail & Related papers (2020-12-11T14:44:59Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z) - Hidden Cost of Randomized Smoothing [72.93630656906599]
In this paper, we point out the side effects of current randomized smoothing.
Specifically, we articulate and prove two major points: 1) the decision boundaries of smoothed classifiers will shrink, resulting in disparity in class-wise accuracy; 2) applying noise augmentation in the training process does not necessarily resolve the shrinking issue due to the inconsistent learning objectives.
arXiv Detail & Related papers (2020-03-02T23:37:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.