Multi-stage Neural Networks: Function Approximator of Machine Precision
- URL: http://arxiv.org/abs/2307.08934v1
- Date: Tue, 18 Jul 2023 02:47:32 GMT
- Title: Multi-stage Neural Networks: Function Approximator of Machine Precision
- Authors: Yongji Wang, Ching-Yao Lai
- Abstract summary: We develop multi-stage neural networks that reduce prediction errors below $O(10-16)$ with large network size and extended training iterations.
We demonstrate that the prediction error from the multi-stage training for both regression problems and physics-informed neural networks can nearly reach the machine-precision $O(10-16)$ of double-floating point within a finite number of iterations.
- Score: 0.456877715768796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning techniques are increasingly applied to scientific problems,
where the precision of networks is crucial. Despite being deemed as universal
function approximators, neural networks, in practice, struggle to reduce the
prediction errors below $O(10^{-5})$ even with large network size and extended
training iterations. To address this issue, we developed the multi-stage neural
networks that divides the training process into different stages, with each
stage using a new network that is optimized to fit the residue from the
previous stage. Across successive stages, the residue magnitudes decreases
substantially and follows an inverse power-law relationship with the residue
frequencies. The multi-stage neural networks effectively mitigate the spectral
biases associated with regular neural networks, enabling them to capture the
high frequency feature of target functions. We demonstrate that the prediction
error from the multi-stage training for both regression problems and
physics-informed neural networks can nearly reach the machine-precision
$O(10^{-16})$ of double-floating point within a finite number of iterations.
Such levels of accuracy are rarely attainable using single neural networks
alone.
Related papers
- Spectrum-Informed Multistage Neural Networks: Multiscale Function Approximators of Machine Precision [1.2663244405597374]
We propose using the novel multistage neural network approach to learn the residue from the previous stage.
We successfully tackle the spectral bias of neural networks.
This approach allows the neural network to fit target functions to double floating-point machine precision.
arXiv Detail & Related papers (2024-07-24T12:11:09Z) - Efficient and Flexible Method for Reducing Moderate-size Deep Neural Networks with Condensation [36.41451383422967]
In scientific applications, the scale of neural networks is generally moderate-size, mainly to ensure the speed of inference.
Existing work has found that the powerful capabilities of neural networks are primarily due to their non-linearity.
We propose a condensation reduction algorithm to verify the feasibility of this idea in practical problems.
arXiv Detail & Related papers (2024-05-02T06:53:40Z) - A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree
Spectral Bias of Neural Networks [79.28094304325116]
Despite the capacity of neural nets to learn arbitrary functions, models trained through gradient descent often exhibit a bias towards simpler'' functions.
We show how this spectral bias towards low-degree frequencies can in fact hurt the neural network's generalization on real-world datasets.
We propose a new scalable functional regularization scheme that aids the neural network to learn higher degree frequencies.
arXiv Detail & Related papers (2023-05-16T20:06:01Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Global quantitative robustness of regression feed-forward neural
networks [0.0]
We adapt the notion of the regression breakdown point to regression neural networks.
We compare the performance, measured by the out-of-sample loss, by a proxy of the breakdown rate.
The results indeed motivate to use robust loss functions for neural network training.
arXiv Detail & Related papers (2022-11-18T09:57:53Z) - Spiking neural network for nonlinear regression [68.8204255655161]
Spiking neural networks carry the potential for a massive reduction in memory and energy consumption.
They introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware.
A framework for regression using spiking neural networks is proposed.
arXiv Detail & Related papers (2022-10-06T13:04:45Z) - Improving the Trainability of Deep Neural Networks through Layerwise
Batch-Entropy Regularization [1.3999481573773072]
We introduce and evaluate the batch-entropy which quantifies the flow of information through each layer of a neural network.
We show that we can train a "vanilla" fully connected network and convolutional neural network with 500 layers by simply adding the batch-entropy regularization term to the loss function.
arXiv Detail & Related papers (2022-08-01T20:31:58Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - Measurement error models: from nonparametric methods to deep neural
networks [3.1798318618973362]
We propose an efficient neural network design for estimating measurement error models.
We use a fully connected feed-forward neural network to approximate the regression function $f(x)$.
We conduct an extensive numerical study to compare the neural network approach with classical nonparametric methods.
arXiv Detail & Related papers (2020-07-15T06:05:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.