Related papers: Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Linear Subspaces

Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Linear Subspaces

URL: http://arxiv.org/abs/2303.00783v2
Date: Thu, 16 Nov 2023 11:08:52 GMT
Title: Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Linear Subspaces
Authors: Odelia Melamed, Gilad Yehudai, Gal Vardi
Abstract summary: We show that standard methods lead to non-robust neural networks. We show that decreasing the scale of the training algorithm, or adding $L$ regularization, can make the trained network more robust to adversarial perturbations.
Score: 24.43191276129614
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite a great deal of research, it is still not well-understood why trained neural networks are highly vulnerable to adversarial examples. In this work we focus on two-layer neural networks trained using data which lie on a low dimensional linear subspace. We show that standard gradient methods lead to non-robust neural networks, namely, networks which have large gradients in directions orthogonal to the data subspace, and are susceptible to small adversarial $L_2$-perturbations in these directions. Moreover, we show that decreasing the initialization scale of the training algorithm, or adding $L_2$ regularization, can make the trained network more robust to adversarial perturbations orthogonal to the data.

Related papers

Low-rank bias, weight decay, and model merging in neural networks [12.352761060862072]
We show several properties of deep neural networks trained with $L2$ regularization. We also investigate a multitask learning phenomenon enabled by $L2$ regularization and low-rank bias.
arXiv Detail & Related papers (2025-02-24T17:17:00Z)
ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions. Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z)
Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise. We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z)
Neural networks trained with SGD learn distributions of increasing complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics. We then exploit higher-order statistics only later during training. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z)
Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations. For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two. For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z)
Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks. We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z)
Reduced Order Modeling using Shallow ReLU Networks with Grassmann Layers [0.0]
This paper presents a nonlinear model reduction method for systems of equations using a structured neural network. We show that our method can be applied to scientific problems in the data-scarce regime, which is typically not well-suited for neural network approximations.
arXiv Detail & Related papers (2020-12-17T21:35:06Z)
Align, then memorise: the dynamics of learning with feedback alignment [12.587037358391418]
Direct Feedback Alignment (DFA) is an efficient alternative to the ubiquitous backpropagation algorithm for training deep neural networks. DFA successfully trains state-of-the-art models such as Transformers, but it notoriously fails to train convolutional networks. Here, we propose a theory for the success of DFA.
arXiv Detail & Related papers (2020-11-24T22:21:27Z)
Towards Understanding Hierarchical Learning: Benefits of Neural Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks. We show that neural representation can achieve improved sample complexities compared with the raw input. Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z)
Feature Purification: How Adversarial Training Performs Robust Deep Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network. We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z)
Binary Neural Networks: A Survey [126.67799882857656]
The binary neural network serves as a promising technique for deploying deep models on resource-limited devices. The binarization inevitably causes severe information loss, and even worse, its discontinuity brings difficulty to the optimization of the deep network. We present a survey of these algorithms, mainly categorized into the native solutions directly conducting binarization, and the optimized ones using techniques like minimizing the quantization error, improving the network loss function, and reducing the gradient error.
arXiv Detail & Related papers (2020-03-31T16:47:20Z)
Avoiding Spurious Local Minima in Deep Quadratic Networks [0.0]
We characterize the landscape of the mean squared nonlinear error for networks with neural activation functions. We prove that deepized neural networks with quadratic activations benefit from similar landscape properties.
arXiv Detail & Related papers (2019-12-31T22:31:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.