Related papers: Low Power Approximate Multiplier Architecture for Deep Neural Networks

Low Power Approximate Multiplier Architecture for Deep Neural Networks

URL: http://arxiv.org/abs/2509.00764v1
Date: Sun, 31 Aug 2025 09:25:42 GMT
Title: Low Power Approximate Multiplier Architecture for Deep Neural Networks
Authors: Pragun Jaswal, L. Hemanth Krishna, B. Srinivasu,
Abstract summary: A 4:2 compressor, introducing only a single combination error, is designed and integrated into an 8x8 unsigned multiplier.<n>The proposed multiplier is employed within a custom convolution layer and evaluated on neural network tasks, including image recognition and denoising.
Score: 0.0
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: This paper proposes an low power approximate multiplier architecture for deep neural network (DNN) applications. A 4:2 compressor, introducing only a single combination error, is designed and integrated into an 8x8 unsigned multiplier. This integration significantly reduces the usage of exact compressors while preserving low error rates. The proposed multiplier is employed within a custom convolution layer and evaluated on neural network tasks, including image recognition and denoising. Hardware evaluation demonstrates that the proposed design achieves up to 30.24% energy savings compared to the best among existing multipliers. In image denoising, the custom approximate convolution layer achieves improved Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) compared to other approximate designs. Additionally, when applied to handwritten digit recognition, the model maintains high classification accuracy. These results demonstrate that the proposed architecture offers a favorable balance between energy efficiency and computational precision, making it suitable for low-power AI hardware implementations.

Related papers

CAMP-HiVe: Cyclic Pair Merging based Efficient DNN Pruning with Hessian-Vector Approximation for Resource-Constrained Systems [3.343542849202802]
We introduce CAMP-HiVe, a cyclic pair merging-based pruning with Hessian Vector approximation.<n>Our experimental results demonstrate that our proposed method achieves significant reductions in computational requirements.<n>It outperforms the existing state-of-the-art neural pruning methods.
arXiv Detail & Related papers (2025-11-09T07:58:36Z)
Effective Invertible Arbitrary Image Rescaling [77.46732646918936]
Invertible Neural Networks (INN) are able to increase upscaling accuracy significantly by optimizing the downscaling and upscaling cycle jointly. A simple and effective invertible arbitrary rescaling network (IARN) is proposed to achieve arbitrary image rescaling by training only one model in this work. It is shown to achieve a state-of-the-art (SOTA) performance in bidirectional arbitrary rescaling without compromising perceptual quality in LR outputs.
arXiv Detail & Related papers (2022-09-26T22:22:30Z)
CADyQ: Content-Aware Dynamic Quantization for Image Super-Resolution [55.50793823060282]
We propose a novel Content-Aware Dynamic Quantization (CADyQ) method for image super-resolution (SR) networks. CADyQ allocates optimal bits to local regions and layers adaptively based on the local contents of an input image. The pipeline has been tested on various SR networks and evaluated on several standard benchmarks.
arXiv Detail & Related papers (2022-07-21T07:50:50Z)
Low-bit Shift Network for End-to-End Spoken Language Understanding [7.851607739211987]
We propose the use of power-of-two quantization, which quantizes continuous parameters into low-bit power-of-two values. This reduces computational complexity by removing expensive multiplication operations and with the use of low-bit weights.
arXiv Detail & Related papers (2022-07-15T14:34:22Z)
SmoothNets: Optimizing CNN architecture design for differentially private deep learning [69.10072367807095]
DPSGD requires clipping and noising of per-sample gradients. This introduces a reduction in model utility compared to non-private training. We distilled a new model architecture termed SmoothNet, which is characterised by increased robustness to the challenges of DP-SGD training.
arXiv Detail & Related papers (2022-05-09T07:51:54Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Sharpness-aware Quantization for Deep Neural Networks [45.150346855368]
Sharpness-Aware Quantization (SAQ) is a novel method to explore the effect of Sharpness-Aware Minimization (SAM) on model compression. We show that SAQ improves the generalization performance of the quantized models, yielding the SOTA results in uniform quantization.
arXiv Detail & Related papers (2021-11-24T05:16:41Z)
NeighCNN: A CNN based SAR Speckle Reduction using Feature preserving Loss Function [1.7188280334580193]
NeighCNN is a deep learning-based speckle reduction algorithm that handles multiplicative noise. Various synthetic, as well as real SAR images, are used for testing the NeighCNN architecture.
arXiv Detail & Related papers (2021-08-26T04:20:07Z)
Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices. Previous unstructured or structured weight pruning methods can hardly truly accelerate inference. We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z)
ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware. The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation. We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z)
A Novel Approximate Hamming Weight Computing for Spiking Neural Networks: an FPGA Friendly Architecture [0.47248250311484113]
We propose a method inspired by synaptic transmission failure for exploiting FPGA lookup tables to compress long input vectors. We classify the compressors into shallow ones with up to two levels of lookup tables and deep ones with more than two levels. Our simulation results show that calculating the Hamming weight of a 1024-bit vector of a spiking neural network by the use of only deep compressors preserves the chaotic behavior of the network.
arXiv Detail & Related papers (2021-04-29T18:27:51Z)
PLAM: a Posit Logarithm-Approximate Multiplier for Power Efficient Posit-based DNNs [8.623938357911467]
The Posit Number System was introduced in 2017 as a replacement for floating-point numbers. This paper proposes a Posit Logarithm-Approximate multiplication scheme to significantly reduce the complexity of posit multipliers. Experiments show that the proposed technique reduces the area, power, and delay of hardware multipliers up to 72.86%, 81.79%, and 17.01%, respectively, without accuracy degradation.
arXiv Detail & Related papers (2021-02-18T10:43:07Z)
Lightweight Residual Densely Connected Convolutional Neural Network [18.310331378001397]
The lightweight residual densely connected blocks are proposed to guaranty the deep supervision, efficient gradient flow, and feature reuse abilities of convolutional neural network. The proposed method decreases the cost of training and inference processes without using any special hardware-software equipment.
arXiv Detail & Related papers (2020-01-02T17:15:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.