Related papers: Subspace Defense: Discarding Adversarial Perturbations by Learning a Subspace for Clean Signals

Subspace Defense: Discarding Adversarial Perturbations by Learning a Subspace for Clean Signals

URL: http://arxiv.org/abs/2403.16176v1
Date: Sun, 24 Mar 2024 14:35:44 GMT
Title: Subspace Defense: Discarding Adversarial Perturbations by Learning a Subspace for Clean Signals
Authors: Rui Zheng, Yuhao Zhou, Zhiheng Xi, Tao Gui, Qi Zhang, Xuanjing Huang,
Abstract summary: adversarial attacks place carefully crafted perturbations on normal examples to fool deep neural networks (DNNs) We first empirically show that the features of either clean signals or adversarial perturbations are redundant and span in low-dimensional linear subspaces respectively with minimal overlap. This makes it possible for DNNs to learn a subspace where only features of clean signals exist while those of perturbations are discarded.
Score: 52.123343364599094
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep neural networks (DNNs) are notoriously vulnerable to adversarial attacks that place carefully crafted perturbations on normal examples to fool DNNs. To better understand such attacks, a characterization of the features carried by adversarial examples is needed. In this paper, we tackle this challenge by inspecting the subspaces of sample features through spectral analysis. We first empirically show that the features of either clean signals or adversarial perturbations are redundant and span in low-dimensional linear subspaces respectively with minimal overlap, and the classical low-dimensional subspace projection can suppress perturbation features out of the subspace of clean signals. This makes it possible for DNNs to learn a subspace where only features of clean signals exist while those of perturbations are discarded, which can facilitate the distinction of adversarial examples. To prevent the residual perturbations that is inevitable in subspace learning, we propose an independence criterion to disentangle clean signals from perturbations. Experimental results show that the proposed strategy enables the model to inherently suppress adversaries, which not only boosts model robustness but also motivates new directions of effective adversarial defense.

Related papers

Towards Robust Spiking Neural Networks:Mitigating Heterogeneous Training Vulnerability via Dominant Eigencomponent Projection [21.5491519186604]
Spiking Neural Networks (SNNs) process information via discrete spikes, enabling them to operate at remarkably low energy levels.<n>Experiments reveal a striking vulnerability when SNNs are trained using the mainstream method--direct encoding combined with backpropagation through time.
arXiv Detail & Related papers (2025-05-16T11:29:49Z)
Contrasting Adversarial Perturbations: The Space of Harmless Perturbations [20.132442083678914]
We show the existence of a harmless perturbation space, in which perturbations leave the network output unchanged when applied to inputs. Our work highlights the distinctive robustness of deep neural networks (DNNs) in contrast to adversarial examples.
arXiv Detail & Related papers (2024-02-03T09:22:07Z)
AFLOW: Developing Adversarial Examples under Extremely Noise-limited Settings [7.828994881163805]
deep neural networks (DNNs) are vulnerable to adversarial attacks. We propose a novel Normalize Flow-based end-to-end attack framework, called AFLOW, to synthesize imperceptible adversarial examples. Compared with existing methods, AFLOW exhibit superiority in imperceptibility, image quality and attack capability.
arXiv Detail & Related papers (2023-10-15T10:54:07Z)
Semi-signed neural fitting for surface reconstruction from unoriented point clouds [53.379712818791894]
We propose SSN-Fitting to reconstruct a better signed distance field. SSN-Fitting consists of a semi-signed supervision and a loss-based region sampling strategy. We conduct experiments to demonstrate that SSN-Fitting achieves state-of-the-art performance under different settings.
arXiv Detail & Related papers (2022-06-14T09:40:17Z)
Origins of Low-dimensional Adversarial Perturbations [17.17170592140042]
We study the phenomenon of low-dimensional adversarial perturbations in classification. The goal is to fool the classifier into flipping its decision on a nonzero fraction of inputs from a designated class. We compute lowerbounds for the fooling rate of any subspace.
arXiv Detail & Related papers (2022-03-25T17:02:49Z)
Dencentralized learning in the presence of low-rank noise [57.18977364494388]
Observations collected by agents in a network may be unreliable due to observation noise or interference. This paper proposes a distributed algorithm that allows each node to improve the reliability of its own observation.
arXiv Detail & Related papers (2022-03-18T09:13:57Z)
Adversarially Robust One-class Novelty Detection [83.1570537254877]
We show that existing novelty detectors are susceptible to adversarial examples. We propose a defense strategy that manipulates the latent space of novelty detectors to improve the robustness against adversarial examples.
arXiv Detail & Related papers (2021-08-25T10:41:29Z)
Removing Adversarial Noise in Class Activation Feature Space [160.78488162713498]
We propose to remove adversarial noise by implementing a self-supervised adversarial training mechanism in a class activation feature space. We train a denoising model to minimize the distances between the adversarial examples and the natural examples in the class activation feature space. Empirical evaluations demonstrate that our method could significantly enhance adversarial robustness in comparison to previous state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-19T10:42:24Z)
Generating Out of Distribution Adversarial Attack using Latent Space Poisoning [5.1314136039587925]
We propose a novel mechanism of generating adversarial examples where the actual image is not corrupted. latent space representation is utilized to tamper with the inherent structure of the image. As opposed to gradient-based attacks, the latent space poisoning exploits the inclination of classifiers to model the independent and identical distribution of the training dataset.
arXiv Detail & Related papers (2020-12-09T13:05:44Z)
Learning to Separate Clusters of Adversarial Representations for Robust Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature. In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property. This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z)
An Analysis of Robustness of Non-Lipschitz Networks [35.64511156980701]
Small input perturbations can often produce large movements in the network's final-layer feature space. In our model, the adversary may move data an arbitrary distance in feature space but only in random low-dimensional subspaces. We provide theoretical guarantees for setting algorithm parameters to optimize over accuracy-abstention trade-offs using data-driven methods.
arXiv Detail & Related papers (2020-10-13T03:56:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.