Neural Arithmetic Units
- URL: http://arxiv.org/abs/2001.05016v1
- Date: Tue, 14 Jan 2020 19:35:04 GMT
- Title: Neural Arithmetic Units
- Authors: Andreas Madsen, Alexander Rosenberg Johansen
- Abstract summary: Neural networks can approximate complex functions, but they struggle to perform exact arithmetic operations over real numbers.
We present two new neural network components: the Neural Addition Unit (NAU), which can learn exact addition and subtraction, and the Neural multiplication Unit (NMU), which can multiply subsets of a vector.
Our proposed units NAU and NMU, compared with previous neural units, converge more consistently, have fewer parameters, learn faster, can converge for larger hidden sizes, obtain sparse and meaningful weights, and can extrapolate to negative and small values.
- Score: 84.65228064780744
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural networks can approximate complex functions, but they struggle to
perform exact arithmetic operations over real numbers. The lack of inductive
bias for arithmetic operations leaves neural networks without the underlying
logic necessary to extrapolate on tasks such as addition, subtraction, and
multiplication. We present two new neural network components: the Neural
Addition Unit (NAU), which can learn exact addition and subtraction; and the
Neural Multiplication Unit (NMU) that can multiply subsets of a vector. The NMU
is, to our knowledge, the first arithmetic neural network component that can
learn to multiply elements from a vector, when the hidden size is large. The
two new components draw inspiration from a theoretical analysis of recently
proposed arithmetic components. We find that careful initialization,
restricting parameter space, and regularizing for sparsity is important when
optimizing the NAU and NMU. Our proposed units NAU and NMU, compared with
previous neural units, converge more consistently, have fewer parameters, learn
faster, can converge for larger hidden sizes, obtain sparse and meaningful
weights, and can extrapolate to negative and small values.
Related papers
- Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z) - Reachability In Simple Neural Networks [2.7195102129095003]
We show that NP-hardness already holds for restricted classes of simple specifications and neural networks.
We give a thorough discussion and outlook of possible extensions for this direction of research on neural network verification.
arXiv Detail & Related papers (2022-03-15T14:25:44Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Floating-Point Multiplication Using Neuromorphic Computing [3.5450828190071655]
We describe a neuromorphic system that performs IEEE 754-compliant floating-point multiplication.
We study the effect of the number of neurons per bit on accuracy and bit error rate, and estimate the optimal number of neurons needed for each component.
arXiv Detail & Related papers (2020-08-30T19:07:14Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Neural Power Units [1.7188280334580197]
We introduce the Neural Power Unit (NPU) that operates on the full domain of real numbers and is capable of learning arbitrary power functions in a single layer.
We show that the NPUs outperform their competitors in terms of accuracy and sparsity on artificial arithmetic datasets.
arXiv Detail & Related papers (2020-06-02T14:58:07Z) - Self-Organized Operational Neural Networks with Generative Neurons [87.32169414230822]
ONNs are heterogenous networks with a generalized neuron model that can encapsulate any set of non-linear operators.
We propose Self-organized ONNs (Self-ONNs) with generative neurons that have the ability to adapt (optimize) the nodal operator of each connection.
arXiv Detail & Related papers (2020-04-24T14:37:56Z) - iNALU: Improved Neural Arithmetic Logic Unit [2.331160520377439]
The recently proposed Neural Arithmetic Logic Unit (NALU) is a novel neural architecture which is able to explicitly represent the mathematical relationships by the units of the network to learn operations such as summation, subtraction or multiplication.
We show that our model solves stability issues and outperforms the original NALU model in means of arithmetic precision and convergence.
arXiv Detail & Related papers (2020-03-17T10:37:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.