Multiply-and-Fire (MNF): An Event-driven Sparse Neural Network
Accelerator
- URL: http://arxiv.org/abs/2204.09797v1
- Date: Wed, 20 Apr 2022 21:56:50 GMT
- Title: Multiply-and-Fire (MNF): An Event-driven Sparse Neural Network
Accelerator
- Authors: Miao Yu, Tingting Xiang, Venkata Pavan Kumar Miriyala, Trevor E.
Carlson
- Abstract summary: This work takes a unique look at sparsity with an event (or activation-driven) approach to ANN acceleration.
Our analytical and experimental results show that this event-driven solution presents a new direction to enable highly efficient AI inference for both CNN and workloads.
- Score: 3.224364382976958
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning, particularly deep neural network inference, has become a
vital workload for many computing systems, from data centers and HPC systems to
edge-based computing. As advances in sparsity have helped improve the
efficiency of AI acceleration, there is a continued need for improved system
efficiency for both high-performance and system-level acceleration.
This work takes a unique look at sparsity with an event (or
activation-driven) approach to ANN acceleration that aims to minimize useless
work, improve utilization, and increase performance and energy efficiency. Our
analytical and experimental results show that this event-driven solution
presents a new direction to enable highly efficient AI inference for both CNN
and MLP workloads.
This work demonstrates state-of-the-art energy efficiency and performance
centring on activation-based sparsity and a highly-parallel dataflow method
that improves the overall functional unit utilization (at 30 fps). This work
enhances energy efficiency over a state-of-the-art solution by 1.46$\times$.
Taken together, this methodology presents a novel, new direction to achieve
high-efficiency, high-performance designs for next-generation AI acceleration
platforms.
Related papers
- Center-Sensitive Kernel Optimization for Efficient On-Device Incremental Learning [88.78080749909665]
Current on-device training methods just focus on efficient training without considering the catastrophic forgetting.
This paper proposes a simple but effective edge-friendly incremental learning framework.
Our method achieves average accuracy boost of 38.08% with even less memory and approximate computation.
arXiv Detail & Related papers (2024-06-13T05:49:29Z) - Augmenting the FedProx Algorithm by Minimizing Convergence [0.0]
We present a novel approach called G Federated Proximity.
Our results indicate a significant increase in the throughput of approximately 90% better convergence compared to existing model performance.
arXiv Detail & Related papers (2024-06-02T14:01:55Z) - Accelerating Neural Network Training: A Brief Review [0.5825410941577593]
This study examines innovative approaches to expedite the training process of deep neural networks (DNN)
The research utilizes sophisticated methodologies, including Gradient Accumulation (GA), Automatic Mixed Precision (AMP), and Pin Memory (PM)
arXiv Detail & Related papers (2023-12-15T18:43:45Z) - Latency-aware Unified Dynamic Networks for Efficient Image Recognition [72.8951331472913]
LAUDNet is a framework to bridge the theoretical and practical efficiency gap in dynamic networks.
It integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping.
It can notably reduce the latency of models like ResNet by over 50% on platforms such as V100,3090, and TX2 GPUs.
arXiv Detail & Related papers (2023-08-30T10:57:41Z) - YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception [1.6683976936678229]
Multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems.
This paper proposed an effective and efficient multi-task learning network to simultaneously perform the task of traffic object detection, drivable road area segmentation and lane detection.
Our model achieved the new state-of-the-art (SOTA) performance in terms of accuracy and speed on the challenging BDD100K dataset.
arXiv Detail & Related papers (2022-08-24T11:00:27Z) - FedDUAP: Federated Learning with Dynamic Update and Adaptive Pruning
Using Shared Data on the Server [64.94942635929284]
Federated Learning (FL) suffers from two critical challenges, i.e., limited computational resources and low training efficiency.
We propose a novel FL framework, FedDUAP, to exploit the insensitive data on the server and the decentralized data in edge devices.
By integrating the two original techniques together, our proposed FL model, FedDUAP, significantly outperforms baseline approaches in terms of accuracy (up to 4.8% higher), efficiency (up to 2.8 times faster), and computational cost (up to 61.9% smaller)
arXiv Detail & Related papers (2022-04-25T10:00:00Z) - FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems [62.20308752994373]
We propose a new smart network interface card (NIC) for distributed AI training systems using field-programmable gate arrays (FPGAs)
Our proposed FPGA-based AI smart NIC enhances overall training performance by 1.6x at 6 nodes, with an estimated 2.5x performance improvement at 32 nodes, compared to the baseline system using conventional NICs.
arXiv Detail & Related papers (2022-04-22T21:57:00Z) - Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples.
We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment.
We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z) - Making EfficientNet More Efficient: Exploring Batch-Independent
Normalization, Group Convolutions and Reduced Resolution Training [8.411385346896413]
We focus on improving the practical efficiency of the state-of-the-art EfficientNet models on a new class of accelerator, the Graphcore IPU.
We do this by extending this family of models in the following ways: (i) generalising depthwise convolutions to group convolutions; (ii) adding proxy-normalized activations to match batch normalization performance with batch-independent statistics; and (iii) reducing compute by lowering the training resolution and inexpensively fine-tuning at higher resolution.
arXiv Detail & Related papers (2021-06-07T14:10:52Z) - Optimization-driven Machine Learning for Intelligent Reflecting Surfaces
Assisted Wireless Networks [82.33619654835348]
Intelligent surface (IRS) has been employed to reshape the wireless channels by controlling individual scattering elements' phase shifts.
Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity.
In this article, we focus on machine learning (ML) approaches for performance in IRS-assisted wireless networks.
arXiv Detail & Related papers (2020-08-29T08:39:43Z) - AutoScale: Optimizing Energy Efficiency of End-to-End Edge Inference
under Stochastic Variance [11.093360539563657]
AutoScale is an adaptive and light-weight execution scaling engine built upon the custom-designed reinforcement learning algorithm.
This paper proposes AutoScale to enable accurate, energy-efficient deep learning inference at the edge.
arXiv Detail & Related papers (2020-05-06T00:30:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.