Training Neural Networks in Single vs Double Precision
- URL: http://arxiv.org/abs/2209.07219v1
- Date: Thu, 15 Sep 2022 11:20:53 GMT
- Title: Training Neural Networks in Single vs Double Precision
- Authors: Tomas Hrycej, Bernhard Bermeitinger, Siegfried Handschuh
- Abstract summary: Conjugate Gradient and RMSprop algorithms are optimized for mean square error.
Experiments show that single-precision can keep up with double-precision if line search finds an improvement.
For strongly nonlinear tasks, both algorithm classes find only solutions fairly poor in terms of mean square error.
- Score: 8.036150169408241
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The commitment to single-precision floating-point arithmetic is widespread in
the deep learning community. To evaluate whether this commitment is justified,
the influence of computing precision (single and double precision) on the
optimization performance of the Conjugate Gradient (CG) method (a second-order
optimization algorithm) and RMSprop (a first-order algorithm) has been
investigated. Tests of neural networks with one to five fully connected hidden
layers and moderate or strong nonlinearity with up to 4 million network
parameters have been optimized for Mean Square Error (MSE). The training tasks
have been set up so that their MSE minimum was known to be zero. Computing
experiments have disclosed that single-precision can keep up (with superlinear
convergence) with double-precision as long as line search finds an improvement.
First-order methods such as RMSprop do not benefit from double precision.
However, for moderately nonlinear tasks, CG is clearly superior. For strongly
nonlinear tasks, both algorithm classes find only solutions fairly poor in
terms of mean square error as related to the output variance. CG with double
floating-point precision is superior whenever the solutions have the potential
to be useful for the application goal.
Related papers
- Efficient Second-Order Neural Network Optimization via Adaptive Trust Region Methods [0.0]
SecondOrderAdaptive (SOAA) is a novel optimization algorithm designed to overcome limitations of traditional second-order techniques.
We empirically demonstrate that SOAA achieves faster and more stable convergence compared to first-order approximations.
arXiv Detail & Related papers (2024-10-03T08:23:06Z) - AdaFisher: Adaptive Second Order Optimization via Fisher Information [22.851200800265914]
We present AdaFisher, an adaptive second-order that leverages a block-diagonal approximation to the Fisher information matrix for adaptive preconditioning gradient.
We demonstrate that AdaFisher outperforms the SOTAs in terms of both accuracy and convergence speed.
arXiv Detail & Related papers (2024-05-26T01:25:02Z) - Guaranteed Approximation Bounds for Mixed-Precision Neural Operators [83.64404557466528]
We build on intuition that neural operator learning inherently induces an approximation error.
We show that our approach reduces GPU memory usage by up to 50% and improves throughput by 58% with little or no reduction in accuracy.
arXiv Detail & Related papers (2023-07-27T17:42:06Z) - Efficient first-order predictor-corrector multiple objective
optimization for fair misinformation detection [5.139559672771439]
Multiple-objective optimization (MOO) aims to simultaneously optimize multiple conflicting objectives and has found important applications in machine learning.
We propose a Gauss-Newton approximation that only scales linearly, and that requires only first-order inner-product per iteration.
The innovations make predictor-corrector possible for large networks.
arXiv Detail & Related papers (2022-09-15T12:32:15Z) - Large-scale Optimization of Partial AUC in a Range of False Positive
Rates [51.12047280149546]
The area under the ROC curve (AUC) is one of the most widely used performance measures for classification models in machine learning.
We develop an efficient approximated gradient descent method based on recent practical envelope smoothing technique.
Our proposed algorithm can also be used to minimize the sum of some ranked range loss, which also lacks efficient solvers.
arXiv Detail & Related papers (2022-03-03T03:46:18Z) - Provable Stochastic Optimization for Global Contrastive Learning: Small
Batch Does Not Harm Performance [53.49803579981569]
We consider a global objective for contrastive learning, which contrasts each positive pair with all negative pairs for an anchor point.
Existing methods such as SimCLR requires a large batch size in order to achieve a satisfactory result.
We propose a memory-efficient optimization algorithm for solving the Global Contrastive Learning of Representations, named SogCLR.
arXiv Detail & Related papers (2022-02-24T22:16:53Z) - Boost Neural Networks by Checkpoints [9.411567653599358]
We propose a novel method to ensemble the checkpoints of deep neural networks (DNNs)
With the same training budget, our method achieves 4.16% lower error on Cifar-100 and 6.96% on Tiny-ImageNet with ResNet-110 architecture.
arXiv Detail & Related papers (2021-10-03T09:14:15Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation.
Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.