Related papers: Impacts of floating-point non-associativity on reproducibility for HPC and deep learning applications

Impacts of floating-point non-associativity on reproducibility for HPC and deep learning applications

URL: http://arxiv.org/abs/2408.05148v3
Date: Wed, 30 Oct 2024 16:52:42 GMT
Title: Impacts of floating-point non-associativity on reproducibility for HPC and deep learning applications
Authors: Sanjif Shanmugavelu, Mathieu Taillefumier, Christopher Culver, Oscar Hernandez, Mark Coletti, Ada Sedova,
Abstract summary: Run to run variability in parallel programs caused by floating-point non-associativity has been known to significantly affect algorithms. We investigate the statistical properties of floating-point non-associativity within modern parallel programming models. We examine the recently-added deterministic options in PyTorch within the context of GPU deployment for deep learning.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Run to run variability in parallel programs caused by floating-point non-associativity has been known to significantly affect reproducibility in iterative algorithms, due to accumulating errors. Non-reproducibility can critically affect the efficiency and effectiveness of correctness testing for stochastic programs. Recently, the sensitivity of deep learning training and inference pipelines to floating-point non-associativity has been found to sometimes be extreme. It can prevent certification for commercial applications, accurate assessment of robustness and sensitivity, and bug detection. New approaches in scientific computing applications have coupled deep learning models with high-performance computing, leading to an aggravation of debugging and testing challenges. Here we perform an investigation of the statistical properties of floating-point non-associativity within modern parallel programming models, and analyze performance and productivity impacts of replacing atomic operations with deterministic alternatives on GPUs. We examine the recently-added deterministic options in PyTorch within the context of GPU deployment for deep learning, uncovering and quantifying the impacts of input parameters triggering run to run variability and reporting on the reliability and completeness of the documentation. Finally, we evaluate the strategy of exploiting automatic determinism that could be provided by deterministic hardware, using the Groq accelerator for inference portions of the deep learning pipeline. We demonstrate the benefits that a hardware-based strategy can provide within reproducibility and correctness efforts.

Related papers

Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning [54.970571745690634]
This work presents the first systematic investigation into how numerical precision affects Large Language Models inference.<n>We develop a lightweight inference pipeline, dubbed LayerCast, that stores weights in 16-bit precision but performs all computations in FP32.<n>Inspired by this, we develop a lightweight inference pipeline, dubbed LayerCast, that stores weights in 16-bit precision but performs all computations in FP32.
arXiv Detail & Related papers (2025-06-11T08:23:53Z)
QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge [55.75103034526652]
We propose QuartDepth which adopts post-training quantization to quantize MDE models with hardware accelerations for ASICs. Our approach involves quantizing both weights and activations to 4-bit precision, reducing the model size and computation cost. We design a flexible and programmable hardware accelerator by supporting kernel fusion and customized instruction programmability.
arXiv Detail & Related papers (2025-03-20T21:03:10Z)
FGP: Feature-Gradient-Prune for Efficient Convolutional Layer Pruning [16.91552023598741]
This paper introduces a novel pruning method called Feature-Gradient Pruning (FGP) It integrates both feature-based and gradient-based information to more effectively evaluate the importance of channels across various target classes. Experiments conducted across multiple tasks and datasets show that FGP significantly reduces computational costs and minimizes accuracy loss.
arXiv Detail & Related papers (2024-11-19T08:42:15Z)
Gradient Descent Efficiency Index [0.0]
This study introduces a new efficiency metric, Ek, designed to quantify the effectiveness of each iteration. The proposed metric accounts for both the relative change in error and the stability of the loss function across iterations. Ek has the potential to guide more informed decisions in the selection and tuning of optimization algorithms in machine learning applications.
arXiv Detail & Related papers (2024-10-25T10:22:22Z)
Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance. Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z)
Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS) We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises. We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z)
Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks. We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z)
On Efficient Uncertainty Estimation for Resource-Constrained Mobile Applications [0.0]
Predictive uncertainty supplements model predictions and enables improved functionality of downstream tasks. We tackle this problem by building upon Monte Carlo Dropout (MCDO) models using the Axolotl framework. We conduct experiments on (1) a multi-class classification task using the CIFAR10 dataset, and (2) a more complex human body segmentation task.
arXiv Detail & Related papers (2021-11-11T22:24:15Z)
Convolutional generative adversarial imputation networks for spatio-temporal missing data in storm surge simulations [86.5302150777089]
Generative Adversarial Imputation Nets (GANs) and GAN-based techniques have attracted attention as unsupervised machine learning methods. We name our proposed method as Con Conval Generative Adversarial Imputation Nets (Conv-GAIN)
arXiv Detail & Related papers (2021-11-03T03:50:48Z)
Probabilistic robust linear quadratic regulators with Gaussian processes [73.0364959221845]
Probabilistic models such as Gaussian processes (GPs) are powerful tools to learn unknown dynamical systems from data for subsequent use in control design. We present a novel controller synthesis for linearized GP dynamics that yields robust controllers with respect to a probabilistic stability margin.
arXiv Detail & Related papers (2021-05-17T08:36:18Z)
Bayesian Active Learning for Wearable Stress and Affect Detection [0.7106986689736827]
Stress detection using on-device deep learning algorithms has been on the rise owing to advancements in pervasive computing. In this paper, we propose a framework with capabilities to represent model uncertainties through approximations in Bayesian Neural Networks. Our proposed framework achieves a considerable efficiency boost during inference, with a substantially low number of acquired pool points.
arXiv Detail & Related papers (2020-12-04T16:19:37Z)
Uncertainty Quantification for Deep Context-Aware Mobile Activity Recognition and Unknown Context Discovery [85.36948722680822]
We develop a context-aware mixture of deep models termed the alpha-beta network. We improve accuracy and F score by 10% by identifying high-level contexts. In order to ensure training stability, we have used a clustering-based pre-training in both public and in-house datasets.
arXiv Detail & Related papers (2020-03-03T19:35:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.