Depth and Feature Learning are Provably Beneficial for Neural Network
Discriminators
- URL: http://arxiv.org/abs/2112.13867v1
- Date: Mon, 27 Dec 2021 19:03:22 GMT
- Title: Depth and Feature Learning are Provably Beneficial for Neural Network
Discriminators
- Authors: Carles Domingo-Enrich
- Abstract summary: We show that deep GAN discriminators are able to distinguish distributions that shallow discriminators cannot distinguish.
This confirms that feature learning is beneficial for discriminators.
- Score: 3.04585143845864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We construct pairs of distributions $\mu_d, \nu_d$ on $\mathbb{R}^d$ such
that the quantity $|\mathbb{E}_{x \sim \mu_d} [F(x)] - \mathbb{E}_{x \sim
\nu_d} [F(x)]|$ decreases as $\Omega(1/d^2)$ for some three-layer ReLU network
$F$ with polynomial width and weights, while declining exponentially in $d$ if
$F$ is any two-layer network with polynomial weights. This shows that deep GAN
discriminators are able to distinguish distributions that shallow
discriminators cannot. Analogously, we build pairs of distributions $\mu_d,
\nu_d$ on $\mathbb{R}^d$ such that $|\mathbb{E}_{x \sim \mu_d} [F(x)] -
\mathbb{E}_{x \sim \nu_d} [F(x)]|$ decreases as $\Omega(1/(d\log d))$ for
two-layer ReLU networks with polynomial weights, while declining exponentially
for bounded-norm functions in the associated RKHS. This confirms that feature
learning is beneficial for discriminators. Our bounds are based on Fourier
transforms.
Related papers
- The Differential and Boomerang Properties of a Class of Binomials [28.489574654566677]
We study the differential and boomerang properties of the function $F_2,u(x)=x2big (1+ueta(x)big)$ over $mathbbF_q$.
We disproving a conjecture proposed in citebudaghyan2024arithmetization which states that there exist infinitely many $q$ and $u$ such that $F_2,u$ is an APN function.
arXiv Detail & Related papers (2024-09-21T23:33:00Z) - Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ under isotropic Gaussian data.
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary link function with a sample and runtime complexity of $n asymp T asymp C(q) cdot d
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Neural Networks Efficiently Learn Low-Dimensional Representations with
SGD [22.703825902761405]
We show that SGD-trained ReLU NNs can learn a single-index target of the form $y=f(langleboldsymbolu,boldsymbolxrangle) + epsilon$ by recovering the principal direction.
We also provide compress guarantees for NNs using the approximate low-rank structure produced by SGD.
arXiv Detail & Related papers (2022-09-29T15:29:10Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Deep Learning in High Dimension: Neural Network Approximation of
Analytic Functions in $L^2(\mathbb{R}^d,\gamma_d)$ [0.0]
We prove expression rates for analytic functions $f:mathbbRdtomathbbR$ in the norm of $L2(mathbbRd,gamma_d)$.
We consider in particular ReLU and ReLU$k$ activations for integer $kgeq 2$.
As an application, we prove expression rate bounds of deep ReLU-NNs for response surfaces of elliptic PDEs with log-Gaussian random field inputs.
arXiv Detail & Related papers (2021-11-13T09:54:32Z) - Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network.
We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z) - A Canonical Transform for Strengthening the Local $L^p$-Type Universal
Approximation Property [4.18804572788063]
$Lp$-type universal approximation theorems guarantee that a given machine learning model class $mathscrFsubseteq C(mathbbRd,mathbbRD)$ is dense in $Lp_mu(mathbbRd,mathbbRD)$.
This paper proposes a generic solution to this approximation theoretic problem by introducing a canonical transformation which "upgrades $mathscrF$'s approximation property"
arXiv Detail & Related papers (2020-06-24T17:46:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.