Related papers: PositNN: Training Deep Neural Networks with Mixed Low-Precision Posit

PositNN: Training Deep Neural Networks with Mixed Low-Precision Posit

URL: http://arxiv.org/abs/2105.00053v2
Date: Tue, 4 May 2021 09:26:38 GMT
Title: PositNN: Training Deep Neural Networks with Mixed Low-Precision Posit
Authors: Gon\c{c}alo Raposo and Pedro Tom\'as and Nuno Roma
Abstract summary: The presented research aims to evaluate the feasibility to train deep convolutional neural networks using posits. A software framework was developed to use simulated posits and quires in end-to-end training and inference. Results suggest that 8-bit posits can substitute 32-bit floats during training with no negative impact on the resulting loss and accuracy.
Score: 5.534626267734822
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Low-precision formats have proven to be an efficient way to reduce not only the memory footprint but also the hardware resources and power consumption of deep learning computations. Under this premise, the posit numerical format appears to be a highly viable substitute for the IEEE floating-point, but its application to neural networks training still requires further research. Some preliminary results have shown that 8-bit (and even smaller) posits may be used for inference and 16-bit for training, while maintaining the model accuracy. The presented research aims to evaluate the feasibility to train deep convolutional neural networks using posits. For such purpose, a software framework was developed to use simulated posits and quires in end-to-end training and inference. This implementation allows using any bit size, configuration, and even mixed precision, suitable for different precision requirements in various stages. The obtained results suggest that 8-bit posits can substitute 32-bit floats during training with no negative impact on the resulting loss and accuracy.

Related papers

Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters. In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z)
Guaranteed Approximation Bounds for Mixed-Precision Neural Operators [83.64404557466528]
We build on intuition that neural operator learning inherently induces an approximation error. We show that our approach reduces GPU memory usage by up to 50% and improves throughput by 58% with little or no reduction in accuracy.
arXiv Detail & Related papers (2023-07-27T17:42:06Z)
The Hidden Power of Pure 16-bit Floating-Point Neural Networks [1.9594704501292781]
Lowering the precision of neural networks from the prevalent 32-bit precision has long been considered harmful to performance. This paper investigates the unexpected performance gain of pure 16-bit neural networks over the 32-bit networks in classification tasks.
arXiv Detail & Related papers (2023-01-30T12:01:45Z)
FP8 Quantization: The Power of the Exponent [19.179749424362686]
This paper investigates the benefit of the floating point format for neural network inference. We detail the choices that can be made for the FP8 format, including the important choice of the number of bits for the mantissa and exponent. We show how these findings translate to real networks, provide an efficient implementation for FP8 simulation, and a new algorithm.
arXiv Detail & Related papers (2022-08-19T09:03:00Z)
8-bit Numerical Formats for Deep Neural Networks [1.304892050913381]
We present an in-depth study on the use of 8-bit floating-point number formats for activations, weights, and gradients for both training and inference. Experiments demonstrate that a suitable choice of these low-precision formats enables faster training and reduced power consumption without any degradation in accuracy for a range of deep learning models for image classification and language processing.
arXiv Detail & Related papers (2022-06-06T21:31:32Z)
LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models. We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity. Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z)
Deep Neural Network Training without Multiplications [0.0]
We show that ResNet can be trained using this operation with competitive classification accuracy. This method will enable eliminating the multiplications in deep neural-network training and inference.
arXiv Detail & Related papers (2020-12-07T05:40:50Z)
Searching for Low-Bit Weights in Quantized Neural Networks [129.8319019563356]
Quantized neural networks with low-bit weights and activations are attractive for developing AI accelerators. We present to regard the discrete weights in an arbitrary quantized neural network as searchable variables, and utilize a differential method to search them accurately.
arXiv Detail & Related papers (2020-09-18T09:13:26Z)
Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization. Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z)
Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks [13.929168096016957]
We introduce a novel methodology for training deep neural networks using 8-bit floating point (FP8) numbers. Reduced bit precision allows for a larger effective memory and increased computational speed. We show that, unlike previous 8-bit precision training methods, the proposed method works out-of-the-box for representative models.
arXiv Detail & Related papers (2020-01-16T06:38:27Z)
Towards Unified INT8 Training for Convolutional Neural Network [83.15673050981624]
We build a unified 8-bit (INT8) training framework for common convolutional neural networks. First, we empirically find the four distinctive characteristics of gradients, which provide us insightful clues for gradient quantization. We propose two universal techniques, including Direction Sensitive Gradient Clipping that reduces the direction deviation of gradients.
arXiv Detail & Related papers (2019-12-29T08:37:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.