Related papers: Adversarial Examples in Random Neural Networks with General Activations

Adversarial Examples in Random Neural Networks with General Activations

URL: http://arxiv.org/abs/2203.17209v1
Date: Thu, 31 Mar 2022 17:36:15 GMT
Title: Adversarial Examples in Random Neural Networks with General Activations
Authors: Andrea Montanari and Yuchen Wu
Abstract summary: adversarial examples are ubiquitous in two-layers networks with sub-exponential width and ReLU or smooth activations. We show that an adversarial example $boldsymbol x'$ can be found with high probability along the direction of the gradient.
Score: 14.12513604585194
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A substantial body of empirical work documents the lack of robustness in deep learning models to adversarial examples. Recent theoretical work proved that adversarial examples are ubiquitous in two-layers networks with sub-exponential width and ReLU or smooth activations, and multi-layer ReLU networks with sub-exponential width. We present a result of the same type, with no restriction on width and for general locally Lipschitz continuous activations. More precisely, given a neural network $f(\,\cdot\,;{\boldsymbol \theta})$ with random weights ${\boldsymbol \theta}$, and feature vector ${\boldsymbol x}$, we show that an adversarial example ${\boldsymbol x}'$ can be found with high probability along the direction of the gradient $\nabla_{{\boldsymbol x}}f({\boldsymbol x};{\boldsymbol \theta})$. Our proof is based on a Gaussian conditioning technique. Instead of proving that $f$ is approximately linear in a neighborhood of ${\boldsymbol x}$, we characterize the joint distribution of $f({\boldsymbol x};{\boldsymbol \theta})$ and $f({\boldsymbol x}';{\boldsymbol \theta})$ for ${\boldsymbol x}' = {\boldsymbol x}-s({\boldsymbol x})\nabla_{{\boldsymbol x}}f({\boldsymbol x};{\boldsymbol \theta})$.

Related papers

Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models. In this work, we initiate the study of provably learning a multi-head attention layer from random examples. We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z)
Solving Quadratic Systems with Full-Rank Matrices Using Sparse or Generative Priors [33.0212223058894]
The problem of recovering a signal from a quadratic system $y_i=boldsymbol xtopboldsymbol A_iboldsymbol x, i=1,ldots,m$ with full-rank matrices $boldsymbol A_i$ frequently arises in applications such as unassigned distance geometry and sub-wavelength imaging. This paper addresses the high-dimensional case where $mll n$ incorporating by prior knowledge of $boldsymbol x$.
arXiv Detail & Related papers (2023-09-16T16:00:07Z)
Extending Matchgate Simulation Methods to Universal Quantum Circuits [4.342241136871849]
Matchgates are a family of parity-preserving two-qubit gates, nearest-neighbour circuits of which are known to be classically simulable in time. We present a simulation method to simulate an $boldsymboln$-qubit circuit containing $boldsymbolN$ gates and $boldsymbolN-m$ of which are matchgates.
arXiv Detail & Related papers (2023-02-06T09:50:16Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation [89.21686761957383]
We study the first gradient descent step on the first-layer parameters $boldsymbolW$ in a two-layer network. Our results demonstrate that even one step can lead to a considerable advantage over random features.
arXiv Detail & Related papers (2022-05-03T12:09:59Z)
Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle. We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z)
Universality of empirical risk minimization [12.764655736673749]
Consider supervised learning from i.i.d. samples where $boldsymbol x_i inmathbbRp$ are feature vectors and $y in mathbbR$ are labels. We study empirical risk universality over a class of functions that are parameterized by $mathsfk.
arXiv Detail & Related papers (2022-02-17T18:53:45Z)
Geometric model for the electron spin correlation [0.0]
The formula for the spin correlation of the bipartite singlet spin state, $C_Q(boldsymbola,boldsymbolb)$, is derived on the basis of a probability distribution $rho(phi)$ that is generic. A geometric model that reproduces the spin correlation serves to validate our approach.
arXiv Detail & Related papers (2021-08-17T20:36:12Z)
Self-training Converts Weak Learners to Strong Learners in Mixture Models [86.7137362125503]
We show that a pseudolabeler $boldsymbolbeta_mathrmpl$ can achieve classification error at most $C_mathrmerr$. We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler $boldsymbolbeta_mathrmpl$ with classification error $C_mathrmerr$ using only $O(d)$ labeled examples.
arXiv Detail & Related papers (2021-06-25T17:59:16Z)
A General Derivative Identity for the Conditional Mean Estimator in Gaussian Noise and Some Applications [128.4391178665731]
Several identities in the literature connect $E[bf X|bf Y=bf y]$ to other quantities such as the conditional variance, score functions, and higher-order conditional moments. The objective of this paper is to provide a unifying view of these identities.
arXiv Detail & Related papers (2021-04-05T12:48:28Z)
Learning a Lie Algebra from Unlabeled Data Pairs [7.329382191592538]
Deep convolutional networks (convnets) show a remarkable ability to learn disentangled representations. This article proposes a machine learning method to discover a nonlinear transformation of the space $mathbbRn$. The key idea is to approximate every target $boldsymboly_i$ by a matrix--vector product of the form $boldsymbolwidetildey_i = boldsymbolphi(t_i) boldsymbolx_i$.
arXiv Detail & Related papers (2020-09-19T23:23:52Z)
Extensions to the Proximal Distance Method of Constrained Optimization [7.813460653362097]
We study the problem of minimizing a loss $f(boldsymbolx)$ subject to constraints of the form $boldsymbolDboldsymbolx in S$. Fusion constraints can capture smoothness, sparsity, or more general constraint patterns.
arXiv Detail & Related papers (2020-09-02T03:32:41Z)
Optimal Combination of Linear and Spectral Estimators for Generalized Linear Models [59.015960528781115]
We show how to optimally combine $hatboldsymbol xrm L$ and $hatboldsymbol xrm s$. In order to establish the limiting distribution of $(boldsymbol x, hatboldsymbol xrm L, hatboldsymbol xrm s)$, we design and analyze an Approximate Message Passing (AMP) algorithm.
arXiv Detail & Related papers (2020-08-07T18:20:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.