Related papers: FlexNN: A Dataflow-aware Flexible Deep Learning Accelerator for Energy-Efficient Edge Devices

FlexNN: A Dataflow-aware Flexible Deep Learning Accelerator for Energy-Efficient Edge Devices

URL: http://arxiv.org/abs/2403.09026v2
Date: Thu, 11 Apr 2024 23:26:33 GMT
Title: FlexNN: A Dataflow-aware Flexible Deep Learning Accelerator for Energy-Efficient Edge Devices
Authors: Arnab Raha, Deepak A. Mathaikutty, Soumendu K. Ghosh, Shamik Kundu,
Abstract summary: This paper introduces FlexNN, a Flexible Neural Network accelerator, which adopts agile design principles. Our design revolutionizes by enabling adaptable dataflows of any type through software descriptors. To further enhance throughput and reduce energy consumption, we propose a novel sparsity-based acceleration logic.
Score: 0.6892601897291335
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper introduces FlexNN, a Flexible Neural Network accelerator, which adopts agile design principles to enable versatile dataflows, enhancing energy efficiency. Unlike conventional convolutional neural network accelerator architectures that adhere to fixed dataflows (such as input, weight, output, or row stationary) for transferring activations and weights between storage and compute units, our design revolutionizes by enabling adaptable dataflows of any type through software configurable descriptors. Considering that data movement costs considerably outweigh compute costs from an energy perspective, the flexibility in dataflow allows us to optimize the movement per layer for minimal data transfer and energy consumption, a capability unattainable in fixed dataflow architectures. To further enhance throughput and reduce energy consumption in the FlexNN architecture, we propose a novel sparsity-based acceleration logic that utilizes fine-grained sparsity in both the activation and weight tensors to bypass redundant computations, thus optimizing the convolution engine within the hardware accelerator. Extensive experimental results underscore a significant enhancement in the performance and energy efficiency of FlexNN relative to existing DNN accelerators.

Related papers

ATP: Adaptive Threshold Pruning for Efficient Data Encoding in Quantum Neural Networks [6.80372007036868]
We introduce Adaptive Threshold Pruning (ATP), an encoding method that reduces entanglement and optimize data complexity for efficient computations in Quantum Neural Networks (QNNs) ATP dynamically prunes non-essential features in the data based on adaptive thresholds, effectively reducing quantum circuit requirements while preserving high performance. Our results highlight ATPs ability to balance computational efficiency and model resilience, achieving significant performance improvements with fewer resources.
arXiv Detail & Related papers (2025-03-26T01:14:26Z)
SpiDR: A Reconfigurable Digital Compute-in-Memory Spiking Neural Network Accelerator for Event-based Perception [8.968583287058959]
Spiking Neural Networks (SNNs) offer an efficient method for processing the asynchronous temporal data generated by Dynamic Vision Sensors (DVS) Existing SNN accelerators suffer from limitations in adaptability to diverse neuron models, bit precisions and network sizes. We propose a scalable and reconfigurable digital compute-in-memory (CIM) SNN accelerator chipname with a set of key features.
arXiv Detail & Related papers (2024-11-05T06:59:02Z)
DCP: Learning Accelerator Dataflow for Neural Network via Propagation [52.06154296196845]
This work proposes an efficient data-centric approach, named Dataflow Code Propagation (DCP), to automatically find the optimal dataflow for DNN layers in seconds without human effort. DCP learns a neural predictor to efficiently update the dataflow codes towards the desired gradient directions to minimize various optimization objectives. For example, without using additional training data, DCP surpasses the GAMMA method that performs a full search using thousands of samples.
arXiv Detail & Related papers (2024-10-09T05:16:44Z)
HASS: Hardware-Aware Sparsity Search for Dataflow DNN Accelerator [47.66463010685586]
We propose a novel approach to exploit unstructured weights and activations sparsity for dataflow accelerators, using software and hardware co-optimization. We achieve an efficiency improvement ranging from 1.3$times$ to 4.2$times$ compared to existing sparse designs.
arXiv Detail & Related papers (2024-06-05T09:25:18Z)
Efflex: Efficient and Flexible Pipeline for Spatio-Temporal Trajectory Graph Modeling and Representation Learning [8.690298376643959]
We introduce Efflex, a comprehensive pipeline for graph modeling and learning of large-temporal trajectories. Efflex pioneers the incorporation of a multivolume kestnear neighbors (KNN) algorithm with feature fusion for graph construction. The groundbreaking graph construction mechanism and the high-performance lightweight GCN increase embedding extraction speed by up to 36 times faster.
arXiv Detail & Related papers (2024-04-15T05:36:27Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
FedDUAP: Federated Learning with Dynamic Update and Adaptive Pruning Using Shared Data on the Server [64.94942635929284]
Federated Learning (FL) suffers from two critical challenges, i.e., limited computational resources and low training efficiency. We propose a novel FL framework, FedDUAP, to exploit the insensitive data on the server and the decentralized data in edge devices. By integrating the two original techniques together, our proposed FL model, FedDUAP, significantly outperforms baseline approaches in terms of accuracy (up to 4.8% higher), efficiency (up to 2.8 times faster), and computational cost (up to 61.9% smaller)
arXiv Detail & Related papers (2022-04-25T10:00:00Z)
Learning to Solve the AC-OPF using Sensitivity-Informed Deep Neural Networks [52.32646357164739]
We propose a deep neural network (DNN) to solve the solutions of the optimal power flow (ACOPF) The proposed SIDNN is compatible with a broad range of OPF schemes. It can be seamlessly integrated in other learning-to-OPF schemes.
arXiv Detail & Related papers (2021-03-27T00:45:23Z)
DRACO: Co-Optimizing Hardware Utilization, and Performance of DNNs on Systolic Accelerator [5.65116500037191]
We propose data reuse computation aware co-optimization (DRACO) DRACO improves the PE utilization of memory-bound DNNs without any additional need for dataflow/micro-architecture modifications. Unlike the previous co-optimization methods, DRACO not only maximizes performance and energy efficiency but also improves the predictive performance of DNNs.
arXiv Detail & Related papers (2020-06-26T17:06:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.