Related papers: Gradient Estimation Methods of Approximate Multipliers for High-Accuracy Retraining of Deep Learning Models

Gradient Estimation Methods of Approximate Multipliers for High-Accuracy Retraining of Deep Learning Models

URL: http://arxiv.org/abs/2509.10519v1
Date: Wed, 03 Sep 2025 16:57:29 GMT
Title: Gradient Estimation Methods of Approximate Multipliers for High-Accuracy Retraining of Deep Learning Models
Authors: Chang Meng, Wayne Burleson, Giovanni De Micheli,
Abstract summary: We propose two methods to obtain more precise gradients of AppMults.<n>The first, called LUT-2D, characterizes the AppMult gradient with 2-dimensional lookup tables (LUTs)<n>The second, called LUT-1D, is a compact and more efficient variant that stores gradient values in 1-dimensional LUTs.
Score: 2.639208701052173
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Approximate multipliers (AppMults) are widely used in deep learning accelerators to reduce their area, delay, and power consumption. However, AppMults introduce arithmetic errors into deep learning models, necessitating a retraining process to recover accuracy. A key step in retraining is computing the gradient of the AppMult, i.e., the partial derivative of the approximate product with respect to each input operand. Existing approaches typically estimate this gradient using that of the accurate multiplier (AccMult), which can lead to suboptimal retraining results. To address this, we propose two methods to obtain more precise gradients of AppMults. The first, called LUT-2D, characterizes the AppMult gradient with 2-dimensional lookup tables (LUTs), providing fine-grained estimation and achieving the highest retraining accuracy. The second, called LUT-1D, is a compact and more efficient variant that stores gradient values in 1-dimensional LUTs, achieving comparable retraining accuracy with shorter runtime. Experimental results show that on CIFAR-10 with convolutional neural networks, our LUT-2D and LUT-1D methods improve retraining accuracy by 3.83% and 3.72% on average, respectively. On ImageNet with vision transformer models, our LUT-1D method improves retraining accuracy by 23.69% on average, compared to a state-of-the-art retraining framework.

Related papers

Improving the Straight-Through Estimator with Zeroth-Order Information [7.09016563801433]
We study the problem of training neural networks with quantized parameters.<n>We propose First-Order-Guided Zeroth-Order Gradient Descent (FOGZO)<n>We show FOGZO improves the tradeoff between quality and training time in Quantization-Aware Pre-Training.
arXiv Detail & Related papers (2025-10-27T23:14:59Z)
Stepping Forward on the Last Mile [8.756033984943178]
We propose a series of algorithm enhancements that further reduce the memory footprint, and the accuracy gap compared to backpropagation. Our results demonstrate that on the last mile of model customization on edge devices, training with fixed-point forward gradients is a feasible and practical approach.
arXiv Detail & Related papers (2024-11-06T16:33:21Z)
Unified Gradient-Based Machine Unlearning with Remain Geometry Enhancement [29.675650285351768]
Machine unlearning (MU) has emerged to enhance the privacy and trustworthiness of deep neural networks. Approximate MU is a practical method for large-scale models. We propose a fast-slow parameter update strategy to implicitly approximate the up-to-date salient unlearning direction.
arXiv Detail & Related papers (2024-09-29T15:17:33Z)
NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval [0.7646713951724011]
Existing approaches either fine-tune the pre-trained model itself or, more efficiently, train adaptor models to transform the output of the pre-trained model. We present NUDGE, a family of novel non-parametric embedding fine-tuning approaches. NUDGE directly modifies the embeddings of data records to maximize the accuracy of $k$-NN retrieval.
arXiv Detail & Related papers (2024-09-04T00:10:36Z)
Inverse-Free Fast Natural Gradient Descent Method for Deep Learning [52.0693420699086]
We present a fast natural gradient descent (FNGD) method that only requires inversion during the first epoch. FNGD exhibits similarities to the average sum in first-order methods, leading to the computational complexity of FNGD being comparable to that of first-order methods.
arXiv Detail & Related papers (2024-03-06T05:13:28Z)
DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training [33.11416096294998]
Zeroth-order (ZO) optimization has become a popular technique for solving machine learning (ML) problems. No prior work has demonstrated the effectiveness of ZO optimization in training deep neural networks (DNNs) without a significant decrease in performance. We develop DeepZero, a principled ZO deep learning (DL) framework that can scale ZO optimization to DNN training from scratch.
arXiv Detail & Related papers (2023-10-03T13:05:36Z)
Neural Gradient Learning and Optimization for Oriented Point Normal Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation. We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors. Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z)
Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks. We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights. Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z)
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models [134.83964935755964]
In deep learning, different kinds of deep networks typically need different extrapolations, which have to be chosen after multiple trials.<n>To relieve this issue and consistently improve the model training speed deep networks, we propose the ADAtive Nesterov momentum Transformer.
arXiv Detail & Related papers (2022-08-13T16:04:39Z)
Adapting Stepsizes by Momentumized Gradients Improves Optimization and Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing. textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing. textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
arXiv Detail & Related papers (2021-06-22T03:13:23Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step. We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.