Related papers: Mixed precision accumulation for neural network inference guided by componentwise forward error analysis

Mixed precision accumulation for neural network inference guided by componentwise forward error analysis

URL: http://arxiv.org/abs/2503.15568v1
Date: Wed, 19 Mar 2025 09:19:11 GMT
Title: Mixed precision accumulation for neural network inference guided by componentwise forward error analysis
Authors: El-Mehdi El Arar, Silviu-Ioan Filip, Theo Mary, Elisa Riccietti,
Abstract summary: We propose a mathematically founded mixed precision accumulation strategy for inference of neural networks.<n>Our strategy is based on a new componentwise forward error analysis that explains the propagation of errors in the forward pass of neural networks.
Score: 2.4374097382908477
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work proposes a mathematically founded mixed precision accumulation strategy for the inference of neural networks. Our strategy is based on a new componentwise forward error analysis that explains the propagation of errors in the forward pass of neural networks. Specifically, our analysis shows that the error in each component of the output of a layer is proportional to the condition number of the inner product between the weights and the input, multiplied by the condition number of the activation function. These condition numbers can vary widely from one component to the other, thus creating a significant opportunity to introduce mixed precision: each component should be accumulated in a precision inversely proportional to the product of these condition numbers. We propose a practical algorithm that exploits this observation: it first computes all components in low precision, uses this output to estimate the condition numbers, and recomputes in higher precision only the components associated with large condition numbers. We test our algorithm on various networks and datasets and confirm experimentally that it can significantly improve the cost--accuracy tradeoff compared with uniform precision accumulation baselines.

Related papers

LAMP: Look-Ahead Mixed-Precision Inference of Large Language Models [2.845351470902218]
This article addresses the floating-point computation of compositionally-rich functions, concentrating on transformer inference.<n>We provide an adaptive strategy that selects a small subset of components of $g(mathrmx)$ to be computed more accurately while all other computations can be carried out with lower accuracy.<n>We study the effectiveness of this algorithm numerically on GPT-2 models and demonstrate that already very low recomputation rates allow for improvements of up to two orders of magnitude in accuracy.
arXiv Detail & Related papers (2026-01-29T12:26:00Z)
Algorithms and data structures for automatic precision estimation of neural networks [0.0]
We extend a neural network library with automatic precision estimation for floating point computations.<n>We discuss conditions to make estimations exact and preserve high computation performance of neural networks training and inference.
arXiv Detail & Related papers (2025-09-29T11:13:29Z)
Precision Neural Networks: Joint Graph And Relational Learning [36.05842226689587]
CoVariance Neural Networks (VNNs) perform convolutions on the graph determined by the covariance matrix of the data.<n>We study Precision Neural Networks (PNNs) on the precision matrix -- the inverse covariance.<n>We formulate an optimization problem that jointly learns the network parameters and the precision matrix, and solve it via alternating optimization.
arXiv Detail & Related papers (2025-09-18T10:22:05Z)
Consistency of Learned Sparse Grid Quadrature Rules using NeuralODEs [1.3654846342364308]
This paper provides a proof of the consistency of sparse grid quadrature for numerical integration of high dimensional distributions.<n>A decomposition of the total numerical error in quadrature error and statistical error is provided.
arXiv Detail & Related papers (2025-07-02T09:37:16Z)
Semiparametric conformal prediction [79.6147286161434]
We construct a conformal prediction set accounting for the joint correlation structure of the vector-valued non-conformity scores.<n>We flexibly estimate the joint cumulative distribution function (CDF) of the scores.<n>Our method yields desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z)
Binary Losses for Density Ratio Estimation [2.512309434783062]
Estimating the ratio of two probability densities from a finite number of observations is a central machine learning problem.<n>In this work, we characterize all loss functions that result in density ratio estimators with small error.<n>We obtain a simple recipe for constructing loss functions with certain properties, such as those that prioritize an accurate estimation of large density ratio values.
arXiv Detail & Related papers (2024-07-01T15:24:34Z)
A Fourier Approach to the Parameter Estimation Problem for One-dimensional Gaussian Mixture Models [21.436254507839738]
We propose a novel algorithm for estimating parameters in one-dimensional Gaussian mixture models. We show that our algorithm achieves better scores in likelihood, AIC, and BIC when compared to the EM algorithm.
arXiv Detail & Related papers (2024-04-19T03:53:50Z)
Guaranteed Approximation Bounds for Mixed-Precision Neural Operators [83.64404557466528]
We build on intuition that neural operator learning inherently induces an approximation error. We show that our approach reduces GPU memory usage by up to 50% and improves throughput by 58% with little or no reduction in accuracy.
arXiv Detail & Related papers (2023-07-27T17:42:06Z)
Combining Gradients and Probabilities for Heterogeneous Approximation of Neural Networks [2.5744053804694893]
We discuss the validity of additive Gaussian noise as a surrogate model for behavioral simulation of approximate multipliers. The amount of noise injected into the accurate computations is learned during network training using backpropagation. Our experiments show that the combination of heterogeneous approximation and neural network retraining reduces the energy consumption for variants.
arXiv Detail & Related papers (2022-08-15T15:17:34Z)
E2N: Error Estimation Networks for Goal-Oriented Mesh Adaptation [6.132664589282657]
We develop a "data-driven" goal-oriented mesh adaptation approach with an appropriately configured and trained neural network. An element-by-element construction is employed here, whereby local values of various parameters related to the mesh geometry are taken as inputs. We demonstrate that this approach is able to obtain the same accuracy with a reduced computational cost.
arXiv Detail & Related papers (2022-07-22T17:41:37Z)
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Uncertainty [58.144520501201995]
Bi-Lipschitz regularization of neural network layers preserve relative distances between data instances in the feature spaces of each layer. With the use of an attentive set encoder, we propose to meta learn either diagonal or diagonal plus low-rank factors to efficiently construct task specific covariance matrices. We also propose an inference procedure which utilizes scaled energy to achieve a final predictive distribution.
arXiv Detail & Related papers (2021-10-12T22:04:19Z)
Variational Physics Informed Neural Networks: the role of quadratures and test functions [0.0]
We analyze how Gaussian or Newton-Cotes quadrature rules of different precisions and piecewise test functions of different degrees affect the convergence rate of Variational Physics Informed Neural Networks (VPINN) Using a Petrov-Galerkin framework relying on an inf-sup condition, we derive an a priori error estimate in the energy norm between the exact solution and a suitable high-order piecewise interpolant of a computed neural network.
arXiv Detail & Related papers (2021-09-05T10:06:35Z)
Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation [58.80806716024701]
We study the global structure of attention scores computed using dot-product based self-attention. We find that most of the variation among attention scores lie in a low-dimensional eigenspace. We propose to compute scores only for a partial subset of token pairs, and use them to estimate scores for the remaining pairs.
arXiv Detail & Related papers (2021-06-16T14:38:42Z)
Learning Optical Flow from a Few Matches [67.83633948984954]
We show that the dense correlation volume representation is redundant and accurate flow estimation can be achieved with only a fraction of elements in it. Experiments show that our method can reduce computational cost and memory use significantly, while maintaining high accuracy.
arXiv Detail & Related papers (2021-04-05T21:44:00Z)
Neural Control Variates [71.42768823631918]
We show that a set of neural networks can face the challenge of finding a good approximation of the integrand. We derive a theoretically optimal, variance-minimizing loss function, and propose an alternative, composite loss for stable online training in practice. Specifically, we show that the learned light-field approximation is of sufficient quality for high-order bounces, allowing us to omit the error correction and thereby dramatically reduce the noise at the cost of negligible visible bias.
arXiv Detail & Related papers (2020-06-02T11:17:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.