Related papers: Solving adversarial examples requires solving exponential misalignment

Solving adversarial examples requires solving exponential misalignment

URL: http://arxiv.org/abs/2603.03507v1
Date: Tue, 03 Mar 2026 20:28:22 GMT
Title: Solving adversarial examples requires solving exponential misalignment
Authors: Alessandro Salvatore, Stanislav Fort, Surya Ganguli,
Abstract summary: Adversarial attacks - input perturbations imperceptible to humans that fool neural networks - remain a persistent failure mode in machine learning.<n>We analyze a network's manifold (PM) for a class concept as the space of all inputs confidently assigned to that class by the network.<n>We find, strikingly, that the dimensionalities of neural network PMs are orders of magnitude higher than those of natural human concepts.
Score: 58.04667880030032
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Adversarial attacks - input perturbations imperceptible to humans that fool neural networks - remain both a persistent failure mode in machine learning, and a phenomenon with mysterious origins. To shed light, we define and analyze a network's perceptual manifold (PM) for a class concept as the space of all inputs confidently assigned to that class by the network. We find, strikingly, that the dimensionalities of neural network PMs are orders of magnitude higher than those of natural human concepts. Since volume typically grows exponentially with dimension, this suggests exponential misalignment between machines and humans, with exponentially many inputs confidently assigned to concepts by machines but not humans. Furthermore, this provides a natural geometric hypothesis for the origin of adversarial examples: because a network's PM fills such a large region of input space, any input will be very close to any class concept's PM. Our hypothesis thus suggests that adversarial robustness cannot be attained without dimensional alignment of machine and human PMs, and therefore makes strong predictions: both robust accuracy and distance to any PM should be negatively correlated with the PM dimension. We confirmed these predictions across 18 different networks of varying robust accuracy. Crucially, we find even the most robust networks are still exponentially misaligned, and only the few PMs whose dimensionality approaches that of human concepts exhibit alignment to human perception. Our results connect the fields of alignment and adversarial examples, and suggest the curse of high dimensionality of machine PMs is a major impediment to adversarial robustness.

Related papers

Explicit Modelling of Theory of Mind for Belief Prediction in Nonverbal Social Interactions [9.318796743761224]
We propose MToMnet - a Theory of Mind (ToM) neural network for predicting beliefs and their dynamics during human social interactions from multimodal input. MToMnet encodes contextual cues and integrates them with person-specific cues (human gaze and body language) in a separate MindNet for each person. Our results demonstrate that MToMnet surpasses existing methods by a large margin while at the same time requiring a significantly smaller number of parameters.
arXiv Detail & Related papers (2024-07-09T11:15:51Z)
A Geometrical Approach to Evaluate the Adversarial Robustness of Deep Neural Networks [52.09243852066406]
Adversarial Converging Time Score (ACTS) measures the converging time as an adversarial robustness metric. We validate the effectiveness and generalization of the proposed ACTS metric against different adversarial attacks on the large-scale ImageNet dataset.
arXiv Detail & Related papers (2023-10-10T09:39:38Z)
Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs. We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD. We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z)
Searching for the Essence of Adversarial Perturbations [73.96215665913797]
We show that adversarial perturbations contain human-recognizable information, which is the key conspirator responsible for a neural network's erroneous prediction. This concept of human-recognizable information allows us to explain key features related to adversarial perturbations.
arXiv Detail & Related papers (2022-05-30T18:04:57Z)
On the uncertainty principle of neural networks [36.098205818550554]
We show that neural networks are subject to an uncertainty relation, which manifests as a fundamental limitation in their ability to simultaneously achieve high accuracy and robustness against adversarial attacks.<n>Our findings reveal that the complementarity principle, a cornerstone of quantum physics, applies to neural networks, imposing fundamental limits on their capabilities in simultaneous learning of conjugate features.
arXiv Detail & Related papers (2022-05-03T13:48:12Z)
Learning to Predict Diverse Human Motions from a Single Image via Mixture Density Networks [9.06677862854201]
We propose a novel approach to predict future human motions from a single image, with mixture density networks (MDN) modeling. Contrary to most existing deep human motion prediction approaches, the multimodal nature of MDN enables the generation of diverse future motion hypotheses. Our trained model directly takes an image as input and generates multiple plausible motions that satisfy the given condition.
arXiv Detail & Related papers (2021-09-13T08:49:33Z)
Development of Human Motion Prediction Strategy using Inception Residual Block [1.0705399532413613]
We propose an Inception Residual Block (IRB) to detect temporal features in human poses. Our main contribution is to propose a residual connection between input and the output of the inception block to have a continuity between the previously observed pose and the next predicted pose. With this proposed architecture, it learns prior knowledge much better about human poses and we achieve much higher prediction accuracy as detailed in the paper.
arXiv Detail & Related papers (2021-08-09T12:49:48Z)
Vulnerability Under Adversarial Machine Learning: Bias or Variance? [77.30759061082085]
We investigate the effect of adversarial machine learning on the bias and variance of a trained deep neural network. Our analysis sheds light on why the deep neural networks have poor performance under adversarial perturbation. We introduce a new adversarial machine learning algorithm with lower computational complexity than well-known adversarial machine learning strategies.
arXiv Detail & Related papers (2020-08-01T00:58:54Z)
Relationship between manifold smoothness and adversarial vulnerability in deep learning with local errors [2.7834038784275403]
We study the origin of the adversarial vulnerability in artificial neural networks. Our study reveals that a high generalization accuracy requires a relatively fast power-law decay of the eigen-spectrum of hidden representations.
arXiv Detail & Related papers (2020-07-04T08:47:51Z)
Adversarial Robustness Guarantees for Random Deep Neural Networks [15.68430580530443]
adversarial examples are incorrectly classified inputs that are extremely close to a correctly classified input. We prove that for any $pge1$, the $ellp$ distance of any given input from the classification boundary scales as one over the square root of the dimension of the input times the $ellp$ norm of the input. The results constitute a fundamental advance in the theoretical understanding of adversarial examples, and open the way to a thorough theoretical characterization of the relation between network architecture and robustness to adversarial perturbations.
arXiv Detail & Related papers (2020-04-13T13:07:26Z)
Firearm Detection and Segmentation Using an Ensemble of Semantic Neural Networks [62.997667081978825]
We present a weapon detection system based on an ensemble of semantic Convolutional Neural Networks. A set of simpler neural networks dedicated to specific tasks requires less computational resources and can be trained in parallel. The overall output of the system given by the aggregation of the outputs of individual networks can be tuned by a user to trade-off false positives and false negatives.
arXiv Detail & Related papers (2020-02-11T13:58:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.