Related papers: Expressive Power of ReLU and Step Networks under Floating-Point Operations

Expressive Power of ReLU and Step Networks under Floating-Point Operations

URL: http://arxiv.org/abs/2401.15121v2
Date: Tue, 16 Jul 2024 01:00:31 GMT
Title: Expressive Power of ReLU and Step Networks under Floating-Point Operations
Authors: Yeachan Park, Geonho Hwang, Wonyeol Lee, Sejun Park,
Abstract summary: We show that neural networks using a binary threshold unit or ReLU can memorize any finite input/output pairs. We also show similar results on memorization and universal approximation when floating-point operations use finite bits for both significand and exponent.
Score: 11.29958155597398
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The study of the expressive power of neural networks has investigated the fundamental limits of neural networks. Most existing results assume real-valued inputs and parameters as well as exact operations during the evaluation of neural networks. However, neural networks are typically executed on computers that can only represent a tiny subset of the reals and apply inexact operations, i.e., most existing results do not apply to neural networks used in practice. In this work, we analyze the expressive power of neural networks under a more realistic setup: when we use floating-point numbers and operations as in practice. Our first set of results assumes floating-point operations where the significand of a float is represented by finite bits but its exponent can take any integer value. Under this setup, we show that neural networks using a binary threshold unit or ReLU can memorize any finite input/output pairs and can approximate any continuous function within an arbitrary error. In particular, the number of parameters in our constructions for universal approximation and memorization coincides with that in classical results assuming exact mathematical operations. We also show similar results on memorization and universal approximation when floating-point operations use finite bits for both significand and exponent; these results are applicable to many popular floating-point formats such as those defined in the IEEE 754 standard (e.g., 32-bit single-precision format) and bfloat16.

Related papers

Floating-Point Neural Networks Are Provably Robust Universal Approximators [11.144011131237935]
neural networks enjoy a more general interval universal approximation (IUA) theorem.<n>An open question is whether the IUA theorem still holds in the floating-point setting.<n>Our IUA theorem in the floating-point setting exhibits material differences from the real-valued setting.
arXiv Detail & Related papers (2025-06-19T06:43:04Z)
Principled Approaches for Extending Neural Architectures to Function Spaces for Operator Learning [78.88684753303794]
Deep learning has predominantly advanced through applications in computer vision and natural language processing.<n>Neural operators are a principled way to generalize neural networks to mappings between function spaces.<n>This paper identifies and distills the key principles for constructing practical implementations of mappings between infinite-dimensional function spaces.
arXiv Detail & Related papers (2025-06-12T17:59:31Z)
Memorization With Neural Nets: Going Beyond the Worst Case [5.03863830033243]
In practice, deep neural networks are often able to easily interpolate their training data. We introduce a simple randomized algorithm that constructs an interpolating three-layer neural network in time. We obtain guarantees that are independent of the number of samples and hence move beyond worst-case memorization capacity bounds.
arXiv Detail & Related papers (2023-09-30T10:06:05Z)
Predictions Based on Pixel Data: Insights from PDEs and Finite Differences [0.0]
This paper deals with approximation of time sequences where each observation is a matrix. We show that with relatively small networks, we can represent exactly a class of numerical discretizations of PDEs based on the method of lines. Our network architecture is inspired by those typically adopted in the approximation of time sequences.
arXiv Detail & Related papers (2023-05-01T08:54:45Z)
Memorization Capacity of Neural Networks with Conditional Computation [14.048989759890475]
We show that memorizing a collection of $n$ input-output relationships can be accomplished via a neural network with $O(sqrtn)$ neurons. Using a conditional ReLU network, we show that the same task can be accomplished using only $O(log n)$ operations per input.
arXiv Detail & Related papers (2023-03-20T16:33:17Z)
Provable Data Subset Selection For Efficient Neural Network Training [73.34254513162898]
We introduce the first algorithm to construct coresets for emphRBFNNs, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function network. We then perform empirical evaluations on function approximation and dataset subset selection on popular network architectures and data sets.
arXiv Detail & Related papers (2023-03-09T10:08:34Z)
Fixed-Point Code Synthesis For Neural Networks [0.0]
A new technique is introduced to tune the formats (precision) of already trained neural networks using fixed-point arithmetic. The new optimized neural network computes the output with fixed-point numbers without modifying the accuracy up to a threshold fixed by the user.
arXiv Detail & Related papers (2022-02-04T12:02:54Z)
Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers [50.85524803885483]
This work proposes a formal definition of statistically meaningful (SM) approximation which requires the approximating network to exhibit good statistical learnability. We study SM approximation for two function classes: circuits and Turing machines.
arXiv Detail & Related papers (2021-07-28T04:28:55Z)
The Connection Between Approximation, Depth Separation and Learnability in Neural Networks [70.55686685872008]
We study the connection between learnability and approximation capacity. We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target.
arXiv Detail & Related papers (2021-01-31T11:32:30Z)
Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data. In this paper, we present and evaluate different strategies for the binarization of graph neural networks. We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z)
Deep Neural Network Training without Multiplications [0.0]
We show that ResNet can be trained using this operation with competitive classification accuracy. This method will enable eliminating the multiplications in deep neural-network training and inference.
arXiv Detail & Related papers (2020-12-07T05:40:50Z)
NITI: Training Integer Neural Networks Using Integer-only Arithmetic [4.361357921751159]
We present NITI, an efficient deep neural network training framework that computes exclusively with integer arithmetic. A proof-of-concept open-source software implementation of NITI that utilizes native 8-bit integer operations is presented. NITI achieves negligible accuracy degradation on the MNIST and CIFAR10 datasets using 8-bit integer storage and computation.
arXiv Detail & Related papers (2020-09-28T07:41:36Z)
Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization. Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z)
Neural Arithmetic Units [84.65228064780744]
Neural networks can approximate complex functions, but they struggle to perform exact arithmetic operations over real numbers. We present two new neural network components: the Neural Addition Unit (NAU), which can learn exact addition and subtraction, and the Neural multiplication Unit (NMU), which can multiply subsets of a vector. Our proposed units NAU and NMU, compared with previous neural units, converge more consistently, have fewer parameters, learn faster, can converge for larger hidden sizes, obtain sparse and meaningful weights, and can extrapolate to negative and small values.
arXiv Detail & Related papers (2020-01-14T19:35:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.