Benefits of Additive Noise in Composing Classes with Bounded Capacity
- URL: http://arxiv.org/abs/2206.07199v1
- Date: Tue, 14 Jun 2022 22:57:02 GMT
- Title: Benefits of Additive Noise in Composing Classes with Bounded Capacity
- Authors: Alireza Fathollah Pour, Hassan Ashtiani
- Abstract summary: We show that adding a small amount of Gaussian noise to the output of $mathcalF$ before composing it with $mathcalH$ can effectively control the capacity of $mathcalH circ mathcalF$.
Preliminary empirical results on MNIST dataset indicate that the amount of noise required to improve over existing uniform bounds can be numerically negligible.
- Score: 5.127183254738711
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We observe that given two (compatible) classes of functions $\mathcal{F}$ and
$\mathcal{H}$ with small capacity as measured by their uniform covering
numbers, the capacity of the composition class $\mathcal{H} \circ \mathcal{F}$
can become prohibitively large or even unbounded. We then show that adding a
small amount of Gaussian noise to the output of $\mathcal{F}$ before composing
it with $\mathcal{H}$ can effectively control the capacity of $\mathcal{H}
\circ \mathcal{F}$, offering a general recipe for modular design. To prove our
results, we define new notions of uniform covering number of random functions
with respect to the total variation and Wasserstein distances. We instantiate
our results for the case of multi-layer sigmoid neural networks. Preliminary
empirical results on MNIST dataset indicate that the amount of noise required
to improve over existing uniform bounds can be numerically negligible (i.e.,
element-wise i.i.d. Gaussian noise with standard deviation $10^{-240}$). The
source codes are available at
https://github.com/fathollahpour/composition_noise.
Related papers
- Sample and Computationally Efficient Robust Learning of Gaussian Single-Index Models [37.42736399673992]
A single-index model (SIM) is a function of the form $sigma(mathbfwast cdot mathbfx)$, where $sigma: mathbbR to mathbbR$ is a known link function and $mathbfwast$ is a hidden unit vector.
We show that a proper learner attains $L2$-error of $O(mathrmOPT)+epsilon$, where $
arXiv Detail & Related papers (2024-11-08T17:10:38Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Learning (Very) Simple Generative Models Is Hard [45.13248517769758]
We show that no-time algorithm can solve problem even when output coordinates of $mathbbRdtobbRd'$ are one-hidden-layer ReLU networks with $mathrmpoly(d)$ neurons.
Key ingredient in our proof is an ODE-based construction of a compactly supported, piecewise-linear function $f$ with neurally-bounded slopes such that the pushforward of $mathcalN(0,1)$ under $f$ matches all low-degree moments of $mathcal
arXiv Detail & Related papers (2022-05-31T17:59:09Z) - A lower bound on the space overhead of fault-tolerant quantum computation [51.723084600243716]
The threshold theorem is a fundamental result in the theory of fault-tolerant quantum computation.
We prove an exponential upper bound on the maximal length of fault-tolerant quantum computation with amplitude noise.
arXiv Detail & Related papers (2022-01-31T22:19:49Z) - Multimeasurement Generative Models [7.502947376736449]
We map the problem of sampling from an unknown distribution with density $p_X$ in $mathbbRd$ to the problem of learning and sampling $p_mathbfY$ in $mathbbRMd$ obtained by convolving $p_X$ with a fixed factorial kernel.
arXiv Detail & Related papers (2021-12-18T02:11:36Z) - Universal Regular Conditional Distributions via Probability
Measure-Valued Deep Neural Models [3.8073142980733]
We find that any model built using the proposed framework is dense in the space $C(mathcalX,mathcalP_1(mathcalY))$.
The proposed models are also shown to be capable of generically expressing the aleatoric uncertainty present in most randomized machine learning models.
arXiv Detail & Related papers (2021-05-17T11:34:09Z) - Hardness of Learning Halfspaces with Massart Noise [56.98280399449707]
We study the complexity of PAC learning halfspaces in the presence of Massart (bounded) noise.
We show that there an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a SQ algorithm.
arXiv Detail & Related papers (2020-12-17T16:43:11Z) - Convergence of Sparse Variational Inference in Gaussian Processes
Regression [29.636483122130027]
We show that a method with an overall computational cost of $mathcalO(log N)2D(loglog N)2)$ can be used to perform inference.
arXiv Detail & Related papers (2020-08-01T19:23:34Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.