Related papers: On Algebraic Constructions of Neural Networks with Small Weights

On Algebraic Constructions of Neural Networks with Small Weights

URL: http://arxiv.org/abs/2205.08032v1
Date: Tue, 17 May 2022 00:09:23 GMT
Title: On Algebraic Constructions of Neural Networks with Small Weights
Authors: Kordag Mehmet Kilic, Jin Sima and Jehoshua Bruck
Abstract summary: We study the trade-offs among the weight sizes, circuit size and depth of neural gates. Specifically, given a single linear equation with arbitrary coefficients, we would like to express it using a system of linear equations with smaller (even constant) coefficients. We explicitly construct a constant weight, optimal size matrix to compute the EQUALITY function. We prove the existence of the best-known weight size (linear) matrices to compute the COMPARISON function.
Score: 21.915057426589748
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural gates compute functions based on weighted sums of the input variables. The expressive power of neural gates (number of distinct functions it can compute) depends on the weight sizes and, in general, large weights (exponential in the number of inputs) are required. Studying the trade-offs among the weight sizes, circuit size and depth is a well-studied topic both in circuit complexity theory and the practice of neural computation. We propose a new approach for studying these complexity trade-offs by considering a related algebraic framework. Specifically, given a single linear equation with arbitrary coefficients, we would like to express it using a system of linear equations with smaller (even constant) coefficients. The techniques we developed are based on Siegel's Lemma for the bounds, anti-concentration inequalities for the existential results and extensions of Sylvester-type Hadamard matrices for the constructions. We explicitly construct a constant weight, optimal size matrix to compute the EQUALITY function (checking if two integers expressed in binary are equal). Computing EQUALITY with a single linear equation requires exponentially large weights. In addition, we prove the existence of the best-known weight size (linear) matrices to compute the COMPARISON function (comparing between two integers expressed in binary). In the context of the circuit complexity theory, our results improve the upper bounds on the weight sizes for the best-known circuit sizes for EQUALITY and COMPARISON.

Related papers

Compressed Computation: Dense Circuits in a Toy Model of the Universal-AND Problem [0.0]
Neural networks are capable of superposition -- representing more features than there are dimensions.<n>Recent work considers the analogous concept for computation instead of storage, proposing theoretical constructions.<n>We investigate a toy model for the Universal-AND problem which computes the AND of all $mchoose 2$ pairs of $m$ sparse inputs.
arXiv Detail & Related papers (2025-07-13T22:18:15Z)
Determinant Estimation under Memory Constraints and Neural Scaling Laws [48.68885778257016]
We derive a novel hierarchical algorithm for large-scale log-determinant calculation in memory-constrained settings. We show that the ratio of pseudo-determinants satisfies a power-law relationship, allowing us to derive corresponding scaling laws. This enables accurate estimation of NTK log-determinants from a tiny fraction of the full dataset.
arXiv Detail & Related papers (2025-03-06T13:32:13Z)
Explicit Solution Equation for Every Combinatorial Problem via Tensor Networks: MeLoCoToN [55.2480439325792]
We show that every computation problem has an exact explicit equation that returns its solution. We present a method to obtain an equation that solves exactly any problem, both inversion, constraint satisfaction and optimization.
arXiv Detail & Related papers (2025-02-09T18:16:53Z)
Efficient Variational Quantum Linear Solver for Structured Sparse Matrices [0.6138671548064355]
We show that by using an alternate basis one can better exploit the sparsity and underlying structure of matrix. We employ the concept of unitary completion to design efficient quantum circuits for computing the global/local VQLS cost functions.
arXiv Detail & Related papers (2024-04-25T19:22:05Z)
Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure. We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z)
CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra [62.37017125812101]
We propose a simple but general framework for large-scale linear algebra problems in machine learning, named CoLA. By combining a linear operator abstraction with compositional dispatch rules, CoLA automatically constructs memory and runtime efficient numerical algorithms. We showcase its efficacy across a broad range of applications, including partial differential equations, Gaussian processes, equivariant model construction, and unsupervised learning.
arXiv Detail & Related papers (2023-09-06T14:59:38Z)
Fast quantum algorithm for differential equations [0.5895819801677125]
We present a quantum algorithm with numerical complexity that is polylogarithmic in $N$ but is independent of $kappa$ for a large class of PDEs. Our algorithm generates a quantum state that enables extracting features of the solution.
arXiv Detail & Related papers (2023-06-20T18:01:07Z)
A Recursively Recurrent Neural Network (R2N2) Architecture for Learning Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms. We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)
Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers [50.85524803885483]
This work proposes a formal definition of statistically meaningful (SM) approximation which requires the approximating network to exhibit good statistical learnability. We study SM approximation for two function classes: circuits and Turing machines.
arXiv Detail & Related papers (2021-07-28T04:28:55Z)
PAC-learning gains of Turing machines over circuits and neural networks [1.4502611532302039]
We study the potential gains in sample efficiency that can bring in the principle of minimum description length. We use Turing machines to represent universal models and circuits. We highlight close relationships between classical open problems in Circuit Complexity and the tightness of these.
arXiv Detail & Related papers (2021-03-23T17:03:10Z)
ReLU Neural Networks of Polynomial Size for Exact Maximum Flow Computation [5.35599092568615]
This paper studies the power of artificial neural networks with rectified linear units. We show that two fundamental optimization problems can be solved with neural networks of size $mathcalO(m2n2)$.
arXiv Detail & Related papers (2021-02-12T17:23:34Z)
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning. In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function. Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z)
Berrut Approximated Coded Computing: Straggler Resistance Beyond Polynomial Computing [34.69732430310801]
We propose Berrut Approximated Coded Computing (BACC) as an alternative approach to deal with stragglers effect. BACC is proven to be numerically stable with low computational complexity. In particular, BACC is used to train a deep neural network on a cluster of servers.
arXiv Detail & Related papers (2020-09-17T14:23:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.