Volatility in Certainty (VC): A Metric for Detecting Adversarial Perturbations During Inference in Neural Network Classifiers
- URL: http://arxiv.org/abs/2511.11834v1
- Date: Fri, 14 Nov 2025 19:51:04 GMT
- Title: Volatility in Certainty (VC): A Metric for Detecting Adversarial Perturbations During Inference in Neural Network Classifiers
- Authors: Vahid Hemmati, Ahmad Mohammadi, Abdul-Rauf Nuhu, Reza Ahmari, Parham Kebria, Abdollah Homaifar,
- Abstract summary: Adversarial robustness remains a critical challenge in deploying neural network classifiers.<n>This paper investigates textitVolatility in Certainty (VC), a label-free metric that quantifies irregularities in model confidence.
- Score: 0.5793804025420254
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial robustness remains a critical challenge in deploying neural network classifiers, particularly in real-time systems where ground-truth labels are unavailable during inference. This paper investigates \textit{Volatility in Certainty} (VC), a recently proposed, label-free metric that quantifies irregularities in model confidence by measuring the dispersion of sorted softmax outputs. Specifically, VC is defined as the average squared log-ratio of adjacent certainty values, capturing local fluctuations in model output smoothness. We evaluate VC as a proxy for classification accuracy and as an indicator of adversarial drift. Experiments are conducted on artificial neural networks (ANNs) and convolutional neural networks (CNNs) trained on MNIST, as well as a regularized VGG-like model trained on CIFAR-10. Adversarial examples are generated using the Fast Gradient Sign Method (FGSM) across varying perturbation magnitudes. In addition, mixed test sets are created by gradually introducing adversarial contamination to assess VC's sensitivity under incremental distribution shifts. Our results reveal a strong negative correlation between classification accuracy and log(VC) (correlation rho < -0.90 in most cases), suggesting that VC effectively reflects performance degradation without requiring labeled data. These findings position VC as a scalable, architecture-agnostic, and real-time performance metric suitable for early-warning systems in safety-critical applications.
Related papers
- Reliable Active Learning from Unreliable Labels via Neural Collapse Geometry [5.1511135538176]
Active Learning (AL) promises to reduce annotation cost by prioritizing informative samples, yet its reliability is undermined when labels are noisy or when the data distribution shifts.<n>We propose Active Learning via Neural Collapse Geometry (NCAL-R), a framework that leverages the emergent geometric regularities of deep networks to counteract unreliable supervision.
arXiv Detail & Related papers (2025-10-10T17:50:31Z) - A Few Large Shifts: Layer-Inconsistency Based Minimal Overhead Adversarial Example Detection [13.109309606764754]
We introduce a plug-in detection framework that leverages internal layer-wise inconsistencies within the target model itself.<n>Our method achieves state-of-the-art detection performance with negligible computational overhead.
arXiv Detail & Related papers (2025-05-19T00:48:53Z) - Typicalness-Aware Learning for Failure Detection [26.23185979968123]
Deep neural networks (DNNs) often suffer from the overconfidence issue, where incorrect predictions are made with high confidence scores.
We propose a novel approach called Typicalness-Aware Learning (TAL) to address this issue and improve failure detection performance.
arXiv Detail & Related papers (2024-11-04T11:09:47Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - How adversarial attacks can disrupt seemingly stable accurate classifiers [76.95145661711514]
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data.
Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data.
We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability.
arXiv Detail & Related papers (2023-09-07T12:02:00Z) - Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders.
Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency.
We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Energy-based Out-of-Distribution Detection for Graph Neural Networks [76.0242218180483]
We propose a simple, powerful and efficient OOD detection model for GNN-based learning on graphs, which we call GNNSafe.
GNNSafe achieves up to $17.0%$ AUROC improvement over state-of-the-arts and it could serve as simple yet strong baselines in such an under-developed area.
arXiv Detail & Related papers (2023-02-06T16:38:43Z) - Robust Neural Network Classification via Double Regularization [2.41710192205034]
We propose a novel double regularization of the neural network training loss that combines a penalty on the complexity of the classification model and an optimal reweighting of training observations.
We demonstrate DRFit, for neural net classification of (i) MNIST and (ii) CIFAR-10, in both cases with simulated mislabeling.
arXiv Detail & Related papers (2021-12-15T13:19:20Z) - Word-level confidence estimation for RNN transducers [7.12355127219356]
We present a lightweight neural confidence model tailored for Automatic Speech Recognition (ASR) system with Recurrent Network Transducers (RNN-T)
Compared to other existing approaches, our model utilizes: (a) the time information associated with recognized words, which reduces the computational complexity, and (b) a simple and elegant trick for mapping between sub-word and word sequences.
arXiv Detail & Related papers (2021-09-28T18:38:00Z) - Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.