FlexNN: A Dataflow-aware Flexible Deep Learning Accelerator for Energy-Efficient Edge Devices
- URL: http://arxiv.org/abs/2403.09026v2
- Date: Thu, 11 Apr 2024 23:26:33 GMT
- Title: FlexNN: A Dataflow-aware Flexible Deep Learning Accelerator for Energy-Efficient Edge Devices
- Authors: Arnab Raha, Deepak A. Mathaikutty, Soumendu K. Ghosh, Shamik Kundu,
- Abstract summary: This paper introduces FlexNN, a Flexible Neural Network accelerator, which adopts agile design principles.
Our design revolutionizes by enabling adaptable dataflows of any type through software descriptors.
To further enhance throughput and reduce energy consumption, we propose a novel sparsity-based acceleration logic.
- Score: 0.6892601897291335
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces FlexNN, a Flexible Neural Network accelerator, which adopts agile design principles to enable versatile dataflows, enhancing energy efficiency. Unlike conventional convolutional neural network accelerator architectures that adhere to fixed dataflows (such as input, weight, output, or row stationary) for transferring activations and weights between storage and compute units, our design revolutionizes by enabling adaptable dataflows of any type through software configurable descriptors. Considering that data movement costs considerably outweigh compute costs from an energy perspective, the flexibility in dataflow allows us to optimize the movement per layer for minimal data transfer and energy consumption, a capability unattainable in fixed dataflow architectures. To further enhance throughput and reduce energy consumption in the FlexNN architecture, we propose a novel sparsity-based acceleration logic that utilizes fine-grained sparsity in both the activation and weight tensors to bypass redundant computations, thus optimizing the convolution engine within the hardware accelerator. Extensive experimental results underscore a significant enhancement in the performance and energy efficiency of FlexNN relative to existing DNN accelerators.
Related papers
- Flex-TPU: A Flexible TPU with Runtime Reconfigurable Dataflow Architecture [0.0]
The work herein consists of developing a reconfigurable dataflow TPU, called the Flex-TPU, which can dynamically change the dataflow per layer during run-time.
The results show that our Flex-TPU design achieves a significant performance increase of up to 2.75x compared to conventional TPU, with only minor area and power overheads.
arXiv Detail & Related papers (2024-07-11T17:33:38Z) - HASS: Hardware-Aware Sparsity Search for Dataflow DNN Accelerator [47.66463010685586]
We propose a novel approach to exploit unstructured weights and activations sparsity for dataflow accelerators, using software and hardware co-optimization.
We achieve an efficiency improvement ranging from 1.3$times$ to 4.2$times$ compared to existing sparse designs.
arXiv Detail & Related papers (2024-06-05T09:25:18Z) - Efflex: Efficient and Flexible Pipeline for Spatio-Temporal Trajectory Graph Modeling and Representation Learning [8.690298376643959]
We introduce Efflex, a comprehensive pipeline for graph modeling and learning of large-temporal trajectories.
Efflex pioneers the incorporation of a multivolume kestnear neighbors (KNN) algorithm with feature fusion for graph construction.
The groundbreaking graph construction mechanism and the high-performance lightweight GCN increase embedding extraction speed by up to 36 times faster.
arXiv Detail & Related papers (2024-04-15T05:36:27Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Reconfigurable Distributed FPGA Cluster Design for Deep Learning
Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications.
The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - FedDUAP: Federated Learning with Dynamic Update and Adaptive Pruning
Using Shared Data on the Server [64.94942635929284]
Federated Learning (FL) suffers from two critical challenges, i.e., limited computational resources and low training efficiency.
We propose a novel FL framework, FedDUAP, to exploit the insensitive data on the server and the decentralized data in edge devices.
By integrating the two original techniques together, our proposed FL model, FedDUAP, significantly outperforms baseline approaches in terms of accuracy (up to 4.8% higher), efficiency (up to 2.8 times faster), and computational cost (up to 61.9% smaller)
arXiv Detail & Related papers (2022-04-25T10:00:00Z) - EF-Train: Enable Efficient On-device CNN Training on FPGA Through Data
Reshaping for Online Adaptation or Personalization [11.44696439060875]
EF-Train is an efficient DNN training accelerator with a unified channel-level parallelism-based convolution kernel.
It can achieve end-to-end training on resource-limited low-power edge-level FPGAs.
Our design achieves 46.99 GFLOPS and 6.09GFLOPS/W in terms of throughput and energy efficiency.
arXiv Detail & Related papers (2022-02-18T18:27:42Z) - Learning to Solve the AC-OPF using Sensitivity-Informed Deep Neural
Networks [52.32646357164739]
We propose a deep neural network (DNN) to solve the solutions of the optimal power flow (ACOPF)
The proposed SIDNN is compatible with a broad range of OPF schemes.
It can be seamlessly integrated in other learning-to-OPF schemes.
arXiv Detail & Related papers (2021-03-27T00:45:23Z) - DRACO: Co-Optimizing Hardware Utilization, and Performance of DNNs on
Systolic Accelerator [5.65116500037191]
We propose data reuse computation aware co-optimization (DRACO)
DRACO improves the PE utilization of memory-bound DNNs without any additional need for dataflow/micro-architecture modifications.
Unlike the previous co-optimization methods, DRACO not only maximizes performance and energy efficiency but also improves the predictive performance of DNNs.
arXiv Detail & Related papers (2020-06-26T17:06:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.