Related papers: Leveraging Highly Approximated Multipliers in DNN Inference

Leveraging Highly Approximated Multipliers in DNN Inference

URL: http://arxiv.org/abs/2412.16757v1
Date: Sat, 21 Dec 2024 20:09:29 GMT
Title: Leveraging Highly Approximated Multipliers in DNN Inference
Authors: Georgios Zervakis, Fabio Frustaci, Ourania Spantidi, Iraklis Anagnostopoulos, Hussam Amrouch, Jörg Henkel,
Abstract summary: Our approach does not require retraining and significantly decreases the induced error due to approximate multiplications.<n>Compared to the corresponding approximate designs without using our technique, our approach improves the accuracy by 1.9x on average.
Score: 13.973803328588687
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we present a control variate approximation technique that enables the exploitation of highly approximate multipliers in Deep Neural Network (DNN) accelerators. Our approach does not require retraining and significantly decreases the induced error due to approximate multiplications, improving the overall inference accuracy. As a result, our approach enables satisfying tight accuracy loss constraints while boosting the power savings. Our experimental evaluation, across six different DNNs and several approximate multipliers, demonstrates the versatility of our approach and shows that compared to the accurate design, our control variate approximation achieves the same performance, 45% power reduction, and less than 1% average accuracy loss. Compared to the corresponding approximate designs without using our technique, our approach improves the accuracy by 1.9x on average.

Related papers

MAx-DNN: Multi-Level Arithmetic Approximation for Energy-Efficient DNN Hardware Accelerators [5.5348061557491794]
This paper examines the interplay of fine-grained error resilience of DNN workloads to achieve higher levels of energy efficiency.<n>We use the ResNet-8 model on the CIFAR-10 dataset to evaluate our approximations.<n>The proposed solution delivers up to 54% energy gains in exchange for up to 4% accuracy loss, compared to the baseline quantized model.
arXiv Detail & Related papers (2025-06-26T15:21:12Z)
Accurate and Reliable Predictions with Mutual-Transport Ensemble [46.368395985214875]
We propose a co-trained auxiliary model and adaptively regularizes the cross-entropy loss using Kullback-Leibler (KL) We show that MTE can simultaneously enhance both accuracy and uncertainty calibration. For example, on the CIFAR-100 dataset, our MTE method on ResNet34/50 achieved significant improvements compared to previous state-of-the-art method.
arXiv Detail & Related papers (2024-05-30T03:15:59Z)
Towards Calibrated Robust Fine-Tuning of Vision-Language Models [97.19901765814431]
This work proposes a robust fine-tuning method that improves both OOD accuracy and confidence calibration simultaneously in vision language models. We show that both OOD classification and OOD calibration errors have a shared upper bound consisting of two terms of ID data. Based on this insight, we design a novel framework that conducts fine-tuning with a constrained multimodal contrastive loss enforcing a larger smallest singular value.
arXiv Detail & Related papers (2023-11-03T05:41:25Z)
Guaranteed Approximation Bounds for Mixed-Precision Neural Operators [83.64404557466528]
We build on intuition that neural operator learning inherently induces an approximation error. We show that our approach reduces GPU memory usage by up to 50% and improves throughput by 58% with little or no reduction in accuracy.
arXiv Detail & Related papers (2023-07-27T17:42:06Z)
A Tale of Two Approximations: Tightening Over-Approximation for DNN Robustness Verification via Under-Approximation [17.924507519230424]
We propose a novel dual-approximation approach to tighten over-approximations, leveraging an activation function's underestimated domain to define tight approximation bounds. Our results show that DualApp significantly outperforms the state-of-the-art approaches with 100% - 1000% improvement on the verified robustness ratio and 10.64% on average (up to 66.53%) on the certified lower bound.
arXiv Detail & Related papers (2023-05-26T14:58:30Z)
ApproxABFT: Approximate Algorithm-Based Fault Tolerance for Neural Network Processing [7.578258600530223]
Algorithm-based fault tolerance (ABFT) mechanisms have become a promising solution for reliability enhancement. We propose an Approximate ABFT framework that introduces adaptive error tolerance thresholds to enable selective fault recovery. The proposed ApproxABFT achieves a 43.39% average reduction in redundant computing overhead compared to previous accurate ABFT.
arXiv Detail & Related papers (2023-02-21T06:21:28Z)
Quantized Neural Networks for Low-Precision Accumulation with Guaranteed Overflow Avoidance [68.8204255655161]
We introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference. We evaluate our algorithm across multiple quantized models that we train for different tasks, showing that our approach can reduce the precision of accumulators while maintaining model accuracy with respect to a floating-point baseline.
arXiv Detail & Related papers (2023-01-31T02:46:57Z)
Fast Exploration of the Impact of Precision Reduction on Spiking Neural Networks [63.614519238823206]
Spiking Neural Networks (SNNs) are a practical choice when the target hardware reaches the edge of computing. We employ an Interval Arithmetic (IA) model to develop an exploration methodology that takes advantage of the capability of such a model to propagate the approximation error.
arXiv Detail & Related papers (2022-11-22T15:08:05Z)
Neural Networks with Quantization Constraints [111.42313650830248]
We present a constrained learning approach to quantization training. We show that the resulting problem is strongly dual and does away with gradient estimations. We demonstrate that the proposed approach exhibits competitive performance in image classification tasks.
arXiv Detail & Related papers (2022-10-27T17:12:48Z)
Boost Neural Networks by Checkpoints [9.411567653599358]
We propose a novel method to ensemble the checkpoints of deep neural networks (DNNs) With the same training budget, our method achieves 4.16% lower error on Cifar-100 and 6.96% on Tiny-ImageNet with ResNet-110 architecture.
arXiv Detail & Related papers (2021-10-03T09:14:15Z)
Positive/Negative Approximate Multipliers for DNN Accelerators [3.1921317895626493]
We present a filter-oriented approximation method to map the weights to the appropriate modes of the approximate multiplier. Our approach achieves 18.33% energy gains on average across 7 NNs on 4 different datasets for a maximum accuracy drop of only 1%.
arXiv Detail & Related papers (2021-07-20T09:36:24Z)
FasterPose: A Faster Simple Baseline for Human Pose Estimation [65.8413964785972]
We propose a design paradigm for cost-effective network with LR representation for efficient pose estimation, named FasterPose. We study the training behavior of FasterPose, and formulate a novel regressive cross-entropy (RCE) loss function for accelerating the convergence. Compared with the previously dominant network of pose estimation, our method reduces 58% of the FLOPs and simultaneously gains 1.3% improvement of accuracy.
arXiv Detail & Related papers (2021-07-07T13:39:08Z)
Control Variate Approximation for DNN Accelerators [3.1921317895626493]
We introduce a control variate approximation technique for low error approximate Deep Neural Network (DNN) accelerators. Our approach significantly decreases the induced error due to approximate multiplications in inference, without requiring time-exhaustive retraining.
arXiv Detail & Related papers (2021-02-18T22:11:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.