Adversarial Examples in Random Neural Networks with General Activations
- URL: http://arxiv.org/abs/2203.17209v1
- Date: Thu, 31 Mar 2022 17:36:15 GMT
- Title: Adversarial Examples in Random Neural Networks with General Activations
- Authors: Andrea Montanari and Yuchen Wu
- Abstract summary: adversarial examples are ubiquitous in two-layers networks with sub-exponential width and ReLU or smooth activations.
We show that an adversarial example $boldsymbol x'$ can be found with high probability along the direction of the gradient.
- Score: 14.12513604585194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A substantial body of empirical work documents the lack of robustness in deep
learning models to adversarial examples. Recent theoretical work proved that
adversarial examples are ubiquitous in two-layers networks with sub-exponential
width and ReLU or smooth activations, and multi-layer ReLU networks with
sub-exponential width. We present a result of the same type, with no
restriction on width and for general locally Lipschitz continuous activations.
More precisely, given a neural network $f(\,\cdot\,;{\boldsymbol \theta})$
with random weights ${\boldsymbol \theta}$, and feature vector ${\boldsymbol
x}$, we show that an adversarial example ${\boldsymbol x}'$ can be found with
high probability along the direction of the gradient $\nabla_{{\boldsymbol
x}}f({\boldsymbol x};{\boldsymbol \theta})$. Our proof is based on a Gaussian
conditioning technique. Instead of proving that $f$ is approximately linear in
a neighborhood of ${\boldsymbol x}$, we characterize the joint distribution of
$f({\boldsymbol x};{\boldsymbol \theta})$ and $f({\boldsymbol x}';{\boldsymbol
\theta})$ for ${\boldsymbol x}' = {\boldsymbol x}-s({\boldsymbol
x})\nabla_{{\boldsymbol x}}f({\boldsymbol x};{\boldsymbol \theta})$.
Related papers
- Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Solving Quadratic Systems with Full-Rank Matrices Using Sparse or Generative Priors [33.0212223058894]
The problem of recovering a signal from a quadratic system $y_i=boldsymbol xtopboldsymbol A_iboldsymbol x, i=1,ldots,m$ with full-rank matrices $boldsymbol A_i$ frequently arises in applications such as unassigned distance geometry and sub-wavelength imaging.
This paper addresses the high-dimensional case where $mll n$ incorporating by prior knowledge of $boldsymbol x$.
arXiv Detail & Related papers (2023-09-16T16:00:07Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - High-dimensional Asymptotics of Feature Learning: How One Gradient Step
Improves the Representation [89.21686761957383]
We study the first gradient descent step on the first-layer parameters $boldsymbolW$ in a two-layer network.
Our results demonstrate that even one step can lead to a considerable advantage over random features.
arXiv Detail & Related papers (2022-05-03T12:09:59Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - Universality of empirical risk minimization [12.764655736673749]
Consider supervised learning from i.i.d. samples where $boldsymbol x_i inmathbbRp$ are feature vectors and $y in mathbbR$ are labels.
We study empirical risk universality over a class of functions that are parameterized by $mathsfk.
arXiv Detail & Related papers (2022-02-17T18:53:45Z) - Geometric model for the electron spin correlation [0.0]
The formula for the spin correlation of the bipartite singlet spin state, $C_Q(boldsymbola,boldsymbolb)$, is derived on the basis of a probability distribution $rho(phi)$ that is generic.
A geometric model that reproduces the spin correlation serves to validate our approach.
arXiv Detail & Related papers (2021-08-17T20:36:12Z) - Self-training Converts Weak Learners to Strong Learners in Mixture
Models [86.7137362125503]
We show that a pseudolabeler $boldsymbolbeta_mathrmpl$ can achieve classification error at most $C_mathrmerr$.
We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler $boldsymbolbeta_mathrmpl$ with classification error $C_mathrmerr$ using only $O(d)$ labeled examples.
arXiv Detail & Related papers (2021-06-25T17:59:16Z) - Learning a Lie Algebra from Unlabeled Data Pairs [7.329382191592538]
Deep convolutional networks (convnets) show a remarkable ability to learn disentangled representations.
This article proposes a machine learning method to discover a nonlinear transformation of the space $mathbbRn$.
The key idea is to approximate every target $boldsymboly_i$ by a matrix--vector product of the form $boldsymbolwidetildey_i = boldsymbolphi(t_i) boldsymbolx_i$.
arXiv Detail & Related papers (2020-09-19T23:23:52Z) - Optimal Combination of Linear and Spectral Estimators for Generalized
Linear Models [59.015960528781115]
We show how to optimally combine $hatboldsymbol xrm L$ and $hatboldsymbol xrm s$.
In order to establish the limiting distribution of $(boldsymbol x, hatboldsymbol xrm L, hatboldsymbol xrm s)$, we design and analyze an Approximate Message Passing (AMP) algorithm.
arXiv Detail & Related papers (2020-08-07T18:20:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.