Multiplierless MP-Kernel Machine For Energy-efficient Edge Devices
- URL: http://arxiv.org/abs/2106.01958v1
- Date: Thu, 3 Jun 2021 16:06:08 GMT
- Title: Multiplierless MP-Kernel Machine For Energy-efficient Edge Devices
- Authors: Abhishek Ramdas Nair, Pallab Kumar Nath, Shantanu Chakrabartty, Chetan
Singh Thakur
- Abstract summary: We present a novel framework for designing multiplierless kernel machines.
The framework uses a piecewise linear (PWL) approximation based on a margin propagation (MP) technique.
We propose a hardware-friendly MP-based inference and online training algorithm that has been optimized for a Field Programmable Gate Array (FPGA) platform.
- Score: 6.335302509003343
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a novel framework for designing multiplierless kernel machines
that can be used on resource-constrained platforms like intelligent edge
devices. The framework uses a piecewise linear (PWL) approximation based on a
margin propagation (MP) technique and uses only addition/subtraction, shift,
comparison, and register underflow/overflow operations. We propose a
hardware-friendly MP-based inference and online training algorithm that has
been optimized for a Field Programmable Gate Array (FPGA) platform. Our FPGA
implementation eliminates the need for DSP units and reduces the number of
LUTs. By reusing the same hardware for inference and training, we show that the
platform can overcome classification errors and local minima artifacts that
result from the MP approximation. Using the FPGA platform, we also show that
the proposed multiplierless MP-kernel machine demonstrates superior performance
in terms of power, performance, and area compared to other comparable
implementations.
Related papers
- EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE.
Our results demonstrate an average 21% improvement in prefill throughput over existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z) - Reconfigurable Distributed FPGA Cluster Design for Deep Learning
Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications.
The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z) - Multiplierless In-filter Computing for tinyML Platforms [6.878219199575747]
We present a novel multiplierless framework for in-filter acoustic classification.
We use MP-based approximation for training, including backpropagation mitigating approximation errors.
The framework is more efficient than traditional classification frameworks with just less than 1K slices.
arXiv Detail & Related papers (2023-04-24T04:33:44Z) - ParaGraph: Weighted Graph Representation for Performance Optimization of
HPC Kernels [1.304892050913381]
We introduce a new graph-based program representation for parallel applications that extends the Abstract Syntax Tree.
We evaluate our proposed representation by training a Graph Neural Network (GNN) to predict the runtime of an OpenMP code region.
Results show that our approach is indeed effective and has normalized RMSE as low as 0.004 to at most 0.01 in its runtime predictions.
arXiv Detail & Related papers (2023-04-07T05:52:59Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Resource-constrained FPGA Design for Satellite Component Feature
Extraction [0.0]
This work proposes use of neural network-based object detection algorithm that can be deployed on a resource-constrained FPGA.
Hardware-in-the-loop experiments were performed on the ORION Maneuver Kinematics Simulator at Florida Tech.
Results show the FPGA implementation increases the throughput and decreases latency while maintaining comparable accuracy.
arXiv Detail & Related papers (2023-01-22T04:49:04Z) - Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and
Algorithm Co-design [66.39546326221176]
Attention-based neural networks have become pervasive in many AI tasks.
The use of the attention mechanism and feed-forward network (FFN) demands excessive computational and memory resources.
This paper proposes a hardware-friendly variant that adopts a unified butterfly sparsity pattern to approximate both the attention mechanism and the FFNs.
arXiv Detail & Related papers (2022-09-20T09:28:26Z) - Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark [11.575901540758574]
We present our development experience for the Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms.
We use the open-source hls4ml and FINN perJ, which aim to democratize AI- hardware codesign of optimized neural networks on FPGAs.
The solutions are deployed on system-on-chip (Pynq-Z2) and pure FPGA (Arty A7-100T) platforms.
arXiv Detail & Related papers (2022-06-23T15:57:17Z) - A Deep Learning Inference Scheme Based on Pipelined Matrix
Multiplication Acceleration Design and Non-uniform Quantization [9.454905560571085]
We introduce a low-power Multi-layer Perceptron (MLP) accelerator based on a pipelined matrix multiplication scheme and a nonuniform quantization methodology.
Results show that our method can achieve better performance with fewer power consumption.
arXiv Detail & Related papers (2021-10-10T17:31:27Z) - Covert Model Poisoning Against Federated Learning: Algorithm Design and
Optimization [76.51980153902774]
Federated learning (FL) is vulnerable to external attacks on FL models during parameters transmissions.
In this paper, we propose effective MP algorithms to combat state-of-the-art defensive aggregation mechanisms.
Our experimental results demonstrate that the proposed CMP algorithms are effective and substantially outperform existing attack mechanisms.
arXiv Detail & Related papers (2021-01-28T03:28:18Z) - MPLP++: Fast, Parallel Dual Block-Coordinate Ascent for Dense Graphical
Models [96.1052289276254]
This work introduces a new MAP-solver, based on the popular Dual Block-Coordinate Ascent principle.
Surprisingly, by making a small change to the low-performing solver, we derive the new solver MPLP++ that significantly outperforms all existing solvers by a large margin.
arXiv Detail & Related papers (2020-04-16T16:20:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.