Related papers: Energy Efficient Exact and Approximate Systolic Array Architecture for Matrix Multiplication

Energy Efficient Exact and Approximate Systolic Array Architecture for Matrix Multiplication

URL: http://arxiv.org/abs/2509.00778v1
Date: Sun, 31 Aug 2025 10:15:35 GMT
Title: Energy Efficient Exact and Approximate Systolic Array Architecture for Matrix Multiplication
Authors: Pragun Jaswal, L. Hemanth Krishna, B. Srinivasu,
Abstract summary: Deep Neural Networks (DNNs) require highly efficient matrix multiplication engines for complex computations.<n>This paper presents a systolic array architecture incorporating novel exact and approximate processing elements (PEs)<n>The proposed 8-bit exact and approximate PE designs are employed in a 8x8 systolic array, which achieves a energy savings of 22% and 32%, respectively.
Score: 0.0
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Deep Neural Networks (DNNs) require highly efficient matrix multiplication engines for complex computations. This paper presents a systolic array architecture incorporating novel exact and approximate processing elements (PEs), designed using energy-efficient positive partial product and negative partial product cells, termed as PPC and NPPC, respectively. The proposed 8-bit exact and approximate PE designs are employed in a 8x8 systolic array, which achieves a energy savings of 22% and 32%, respectively, compared to the existing design. To demonstrate their effectiveness, the proposed PEs are integrated into a systolic array (SA) for Discrete Cosine Transform (DCT) computation, achieving high output quality with a PSNR of 38.21,dB. Furthermore, in an edge detection application using convolution, the approximate PE achieves a PSNR of 30.45,dB. These results highlight the potential of the proposed design to deliver significant energy efficiency while maintaining competitive output quality, making it well-suited for error-resilient image and vision processing applications.

Related papers

Implementation of high-efficiency, lightweight residual spiking neural network processor based on field-programmable gate arrays [0.49806798459446283]
This work presents an efficient residual SNN accelerator that combines algorithm and hardware co-design to optimize inference energy efficiency.<n>The proposed processor achieves a classification accuracy of 87.11% on the CIFAR-10 dataset, with an inference time of 3.98 ms per image and an energy efficiency of 183.5 FPS/W.
arXiv Detail & Related papers (2025-12-09T02:08:46Z)
Efficient-Husformer: Efficient Multimodal Transformer Hyperparameter Optimization for Stress and Cognitive Loads [0.0]
Transformer-based models have gained considerable attention in the field of physiological signal analysis.<n>They leverage long-range dependencies and complex patterns in temporal signals, allowing them to achieve performance superior to traditional RNN and CNN models.<n>We present Efficient-Husformer, a novel Transformer-based architecture for multi-class stress detection.
arXiv Detail & Related papers (2025-11-27T12:02:25Z)
Low Power Approximate Multiplier Architecture for Deep Neural Networks [0.0]
A 4:2 compressor, introducing only a single combination error, is designed and integrated into an 8x8 unsigned multiplier.<n>The proposed multiplier is employed within a custom convolution layer and evaluated on neural network tasks, including image recognition and denoising.
arXiv Detail & Related papers (2025-08-31T09:25:42Z)
Efficient Memristive Spiking Neural Networks Architecture with Supervised In-Situ STDP Method [0.0]
Memristor-based Spiking Neural Networks (SNNs) with temporal spike encoding enable ultra-low-energy computation.<n>This paper presents a circuit-level memristive spiking neural network (SNN) architecture trained using a proposed novel supervised in-situ learning algorithm.
arXiv Detail & Related papers (2025-07-28T17:09:48Z)
Synergistic Development of Perovskite Memristors and Algorithms for Robust Analog Computing [53.77822620185878]
We propose a synergistic methodology to concurrently optimize perovskite memristor fabrication and develop robust analog DNNs.<n>We develop "BayesMulti", a training strategy utilizing BO-guided noise injection to improve the resistance of analog DNNs to memristor imperfections.<n>Our integrated approach enables use of analog computing in much deeper and wider networks, achieving up to 100-fold improvements.
arXiv Detail & Related papers (2024-12-03T19:20:08Z)
The Potential of Combined Learning Strategies to Enhance Energy Efficiency of Spiking Neuromorphic Systems [0.0]
This manuscript focuses on enhancing brain-inspired perceptual computing machines through a novel combined learning approach for Convolutional Spiking Neural Networks (CSNNs) CSNNs present a promising alternative to traditional power-intensive and complex machine learning methods like backpropagation, offering energy-efficient spiking neuron processing inspired by the human brain.
arXiv Detail & Related papers (2024-08-13T18:40:50Z)
TeMPO: Efficient Time-Multiplexed Dynamic Photonic Tensor Core for Edge AI with Compact Slow-Light Electro-Optic Modulator [44.74560543672329]
We present a time-multiplexed dynamic photonic tensor accelerator, dubbed TeMPO, with cross-layer device/circuit/architecture customization. We achieve a 368.6 TOPS peak performance, 22.3 TOPS/W energy efficiency, and 1.2 TOPS/mm$2$ compute density. This work signifies the power of cross-layer co-design and domain-specific customization, paving the way for future electronic-photonic accelerators.
arXiv Detail & Related papers (2024-02-12T03:40:32Z)
LitE-SNN: Designing Lightweight and Efficient Spiking Neural Network through Spatial-Temporal Compressive Network Search and Joint Optimization [48.41286573672824]
Spiking Neural Networks (SNNs) mimic the information-processing mechanisms of the human brain and are highly energy-efficient. We propose a new approach named LitE-SNN that incorporates both spatial and temporal compression into the automated network design process.
arXiv Detail & Related papers (2024-01-26T05:23:11Z)
Dynamic Decision Tree Ensembles for Energy-Efficient Inference on IoT Edge Nodes [12.99136544903102]
Decision tree ensembles, such as Random Forests (RFs) and Gradient Boosting (GBTs) are particularly suited for this task, given their relatively low complexity. This paper proposes the use of dynamic ensembles, that adjust the number of executed trees based both on a latency/energy target and on the complexity of the processed input. We focus on deploying these algorithms on multi-core low-power IoT devices, designing a tool that automatically converts a Python ensemble into optimized C code.
arXiv Detail & Related papers (2023-06-16T11:59:18Z)
PCBDet: An Efficient Deep Neural Network Object Detection Architecture for Automatic PCB Component Detection on the Edge [48.7576911714538]
PCBDet is an attention condenser network design that provides state-of-the-art inference throughput. It achieves superior PCB component detection performance compared to other state-of-the-art efficient architecture designs.
arXiv Detail & Related papers (2023-01-23T04:34:25Z)
UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed. The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features. Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z)
Decomposition of Matrix Product States into Shallow Quantum Circuits [62.5210028594015]
tensor network (TN) algorithms can be mapped to parametrized quantum circuits (PQCs) We propose a new protocol for approximating TN states using realistic quantum circuits. Our results reveal one particular protocol, involving sequential growth and optimization of the quantum circuit, to outperform all other methods.
arXiv Detail & Related papers (2022-09-01T17:08:41Z)
EEG-Inception: An Accurate and Robust End-to-End Neural Network for EEG-based Motor Imagery Classification [123.93460670568554]
This paper proposes a novel convolutional neural network (CNN) architecture for accurate and robust EEG-based motor imagery (MI) classification. The proposed CNN model, namely EEG-Inception, is built on the backbone of the Inception-Time network. The proposed network is an end-to-end classification, as it takes the raw EEG signals as the input and does not require complex EEG signal-preprocessing.
arXiv Detail & Related papers (2021-01-24T19:03:10Z)
High-Fidelity Machine Learning Approximations of Large-Scale Optimal Power Flow [49.2540510330407]
AC-OPF is a key building block in many power system applications. Motivated by increased penetration of renewable sources, this paper explores deep learning to deliver efficient approximations to the AC-OPF.
arXiv Detail & Related papers (2020-06-29T20:22:16Z)
ESSOP: Efficient and Scalable Stochastic Outer Product Architecture for Deep Learning [1.2019888796331233]
Matrix-vector multiplications (MVM) and vector-vector outer product (VVOP) are the two most expensive operations associated with the training of deep neural networks (DNNs) We introduce efficient techniques to SC for weight update in DNNs with the activation functions required by many state-of-the-art networks. Our architecture reduces the computational cost by re-using random numbers and replacing certain FP multiplication operations by bit shift scaling. Hardware design of ESSOP at 14nm technology node shows that, compared to a highly pipelined FP16 multiplier, ESSOP is 82.2% and 93.7% better in energy
arXiv Detail & Related papers (2020-03-25T07:54:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.