Memorization Capacity of Neural Networks with Conditional Computation
- URL: http://arxiv.org/abs/2303.11247v1
- Date: Mon, 20 Mar 2023 16:33:17 GMT
- Title: Memorization Capacity of Neural Networks with Conditional Computation
- Authors: Erdem Koyuncu
- Abstract summary: We show that memorizing a collection of $n$ input-output relationships can be accomplished via a neural network with $O(sqrtn)$ neurons.
Using a conditional ReLU network, we show that the same task can be accomplished using only $O(log n)$ operations per input.
- Score: 14.048989759890475
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many empirical studies have demonstrated the performance benefits of
conditional computation in neural networks, including reduced inference time
and power consumption. We study the fundamental limits of neural conditional
computation from the perspective of memorization capacity. For Rectified Linear
Unit (ReLU) networks without conditional computation, it is known that
memorizing a collection of $n$ input-output relationships can be accomplished
via a neural network with $O(\sqrt{n})$ neurons. Calculating the output of this
neural network can be accomplished using $O(\sqrt{n})$ elementary arithmetic
operations of additions, multiplications and comparisons for each input. Using
a conditional ReLU network, we show that the same task can be accomplished
using only $O(\log n)$ operations per input. This represents an almost
exponential improvement as compared to networks without conditional
computation. We also show that the $\Theta(\log n)$ rate is the best possible.
Our achievability result utilizes a general methodology to synthesize a
conditional network out of an unconditional network in a
computationally-efficient manner, bridging the gap between unconditional and
conditional architectures.
Related papers
- On the Complexity of Neural Computation in Superposition [3.9803704378699103]
Recent advances in the understanding of neural networks suggest that superposition is a key mechanism underlying the computational efficiency of large-scale networks.
We show a nearly tight upper bound: logical operations like pairwise AND can be computed using $O(sqrtm' log m')$ neurons and $O(m' log2 m')$ parameters.
Our results open a path for using complexity theoretic techniques in neural network interpretability research.
arXiv Detail & Related papers (2024-09-05T18:58:59Z) - Moccasin: Efficient Tensor Rematerialization for Neural Networks [21.348126106786914]
We develop a new constraint programming formulation called textscMoccasin with only $O(n)$ integer variables.
We present numerical studies that show that our approach is up to an order of magnitude faster than recent work especially for large-scale graphs.
arXiv Detail & Related papers (2023-04-27T18:41:37Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - Does Preprocessing Help Training Over-parameterized Neural Networks? [19.64638346701198]
We propose two novel preprocessing ideas to bypass the $Omega(mnd)$ barrier.
Our results provide theoretical insights for a large number of previously established fast training methods.
arXiv Detail & Related papers (2021-10-09T18:16:23Z) - A quantum algorithm for training wide and deep classical neural networks [72.2614468437919]
We show that conditions amenable to classical trainability via gradient descent coincide with those necessary for efficiently solving quantum linear systems.
We numerically demonstrate that the MNIST image dataset satisfies such conditions.
We provide empirical evidence for $O(log n)$ training of a convolutional neural network with pooling.
arXiv Detail & Related papers (2021-07-19T23:41:03Z) - ReLU Neural Networks of Polynomial Size for Exact Maximum Flow Computation [5.35599092568615]
This paper studies the power of artificial neural networks with rectified linear units.
We show that two fundamental optimization problems can be solved with neural networks of size $mathcalO(m2n2)$.
arXiv Detail & Related papers (2021-02-12T17:23:34Z) - The Connection Between Approximation, Depth Separation and Learnability
in Neural Networks [70.55686685872008]
We study the connection between learnability and approximation capacity.
We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target.
arXiv Detail & Related papers (2021-01-31T11:32:30Z) - Interval Universal Approximation for Neural Networks [47.767793120249095]
We introduce the interval universal approximation (IUA) theorem.
IUA shows that neural networks can approximate any continuous function $f$ as we have known for decades.
We study the computational complexity of constructing neural networks that are amenable to precise interval analysis.
arXiv Detail & Related papers (2020-07-12T20:43:56Z) - Towards Understanding Hierarchical Learning: Benefits of Neural
Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks.
We show that neural representation can achieve improved sample complexities compared with the raw input.
Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z) - Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization.
Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.