Related papers: ExPAN(N)D: Exploring Posits for Efficient Artificial Neural Network Design in FPGA-based Systems

ExPAN(N)D: Exploring Posits for Efficient Artificial Neural Network Design in FPGA-based Systems

URL: http://arxiv.org/abs/2010.12869v2
Date: Tue, 27 Oct 2020 05:28:28 GMT
Title: ExPAN(N)D: Exploring Posits for Efficient Artificial Neural Network Design in FPGA-based Systems
Authors: Suresh Nambi, Salim Ullah, Aditya Lohana, Siva Satyendra Sahoo, Farhad Merchant, Akash Kumar
Abstract summary: This paper analyzes and ingathers the efficacy of the Posit number representation scheme and the efficiency of fixed-point arithmetic implementations for ANNs. We propose a novel Posit to fixed-point converter for enabling high-performance and energy-efficient hardware implementations for ANNs.
Score: 4.2612881037640085
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The recent advances in machine learning, in general, and Artificial Neural Networks (ANN), in particular, has made smart embedded systems an attractive option for a larger number of application areas. However, the high computational complexity, memory footprints, and energy requirements of machine learning models hinder their deployment on resource-constrained embedded systems. Most state-of-the-art works have considered this problem by proposing various low bit-width data representation schemes, optimized arithmetic operators' implementations, and different complexity reduction techniques such as network pruning. To further elevate the implementation gains offered by these individual techniques, there is a need to cross-examine and combine these techniques' unique features. This paper presents ExPAN(N)D, a framework to analyze and ingather the efficacy of the Posit number representation scheme and the efficiency of fixed-point arithmetic implementations for ANNs. The Posit scheme offers a better dynamic range and higher precision for various applications than IEEE $754$ single-precision floating-point format. However, due to the dynamic nature of the various fields of the Posit scheme, the corresponding arithmetic circuits have higher critical path delay and resource requirements than the single-precision-based arithmetic units. Towards this end, we propose a novel Posit to fixed-point converter for enabling high-performance and energy-efficient hardware implementations for ANNs with minimal drop in the output accuracy. We also propose a modified Posit-based representation to store the trained parameters of a network. Compared to an $8$-bit fixed-point-based inference accelerator, our proposed implementation offers $\approx46\%$ and $\approx18\%$ reductions in the storage requirements of the parameters and energy consumption of the MAC units, respectively.

Related papers

Lighter-X: An Efficient and Plug-and-play Strategy for Graph-based Recommendation through Decoupled Propagation [49.865020394064096]
We propose textbfLighter-X, an efficient and modular framework that can be seamlessly integrated with existing GNN-based recommender architectures.<n>Our approach substantially reduces both parameter size and computational complexity while preserving the theoretical guarantees and empirical performance of the base models.<n>Experiments demonstrate that Lighter-X achieves comparable performance to baseline models with significantly fewer parameters.
arXiv Detail & Related papers (2025-10-11T08:33:08Z)
Energy-Aware FPGA Implementation of Spiking Neural Network with LIF Neurons [0.5243460995467893]
Spiking Neural Networks (SNNs) stand out as a cutting-edge solution for TinyML. This paper presents a novel SNN architecture based on the 1st Order Leaky Integrate-and-Fire (LIF) neuron model. A hardware-friendly LIF design is also proposed, and implemented on a Xilinx Artix-7 FPGA.
arXiv Detail & Related papers (2024-11-03T16:42:10Z)
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer [54.713778961605115]
Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community. We propose a novel non-uniform quantizer, dubbed the Adaptive Logarithm AdaLog (AdaLog) quantizer.
arXiv Detail & Related papers (2024-07-17T18:38:48Z)
Stochastic Configuration Machines: FPGA Implementation [4.57421617811378]
configuration networks (SCNs) are a prime choice in industrial applications due to their merits and feasibility for data modelling. This paper aims to implement SCM models on a field programmable gate array (FPGA) and introduce binary-coded inputs to improve learning performance.
arXiv Detail & Related papers (2023-10-30T02:04:20Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
Accurate, Low-latency, Efficient SAR Automatic Target Recognition on FPGA [3.251765107970636]
Synthetic aperture radar (SAR) automatic target recognition (ATR) is the key technique for remote-sensing image recognition. The state-of-the-art convolutional neural networks (CNNs) for SAR ATR suffer from emphhigh computation cost and emphlarge memory footprint. We propose a comprehensive GNN-based model-architecture co-design on FPGA to address the above issues.
arXiv Detail & Related papers (2023-01-04T05:35:30Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
NullaNet Tiny: Ultra-low-latency DNN Inference Through Fixed-function Combinational Logic [4.119948826527649]
Field-programmable gate array (FPGA)-based accelerators are gaining traction as a serious contender to replace graphics processing unit/central processing unit-based platforms. This paper presents NullaNet Tiny, a framework for constructing resource and energy-efficient, ultra-low-latency FPGA-based neural network accelerators.
arXiv Detail & Related papers (2021-04-07T00:16:39Z)
Learning to Solve the AC-OPF using Sensitivity-Informed Deep Neural Networks [52.32646357164739]
We propose a deep neural network (DNN) to solve the solutions of the optimal power flow (ACOPF) The proposed SIDNN is compatible with a broad range of OPF schemes. It can be seamlessly integrated in other learning-to-OPF schemes.
arXiv Detail & Related papers (2021-03-27T00:45:23Z)
ALF: Autoencoder-based Low-rank Filter-sharing for Efficient Convolutional Neural Networks [63.91384986073851]
We propose the autoencoder-based low-rank filter-sharing technique technique (ALF) ALF shows a reduction of 70% in network parameters, 61% in operations and 41% in execution time, with minimal loss in accuracy.
arXiv Detail & Related papers (2020-07-27T09:01:22Z)
AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation. Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.