3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low BitwidthQuantization,
and Ultra-Low Latency Acceleration
- URL: http://arxiv.org/abs/2105.06250v1
- Date: Tue, 11 May 2021 03:22:30 GMT
- Title: 3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low BitwidthQuantization,
and Ultra-Low Latency Acceleration
- Authors: Yao Chen, Cole Hawkins, Kaiqi Zhang, Zheng Zhang, Cong Hao
- Abstract summary: Deep neural network (DNN) based AI applications on the edge require both low-cost computing platforms and high-quality services.
This paper emphasizes the importance of training, quantization and accelerator design, and calls for more research breakthroughs in the area for AI on the edge.
- Score: 8.419854797930668
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The deep neural network (DNN) based AI applications on the edge require both
low-cost computing platforms and high-quality services. However, the limited
memory, computing resources, and power budget of the edge devices constrain the
effectiveness of the DNN algorithms. Developing edge-oriented AI algorithms and
implementations (e.g., accelerators) is challenging. In this paper, we
summarize our recent efforts for efficient on-device AI development from three
aspects, including both training and inference. First, we present on-device
training with ultra-low memory usage. We propose a novel rank-adaptive
tensor-based tensorized neural network model, which offers orders-of-magnitude
memory reduction during training. Second, we introduce an ultra-low bitwidth
quantization method for DNN model compression, achieving the state-of-the-art
accuracy under the same compression ratio. Third, we introduce an ultra-low
latency DNN accelerator design, practicing the software/hardware co-design
methodology. This paper emphasizes the importance and efficacy of training,
quantization and accelerator design, and calls for more research breakthroughs
in the area for AI on the edge.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Energy-Efficient Deployment of Machine Learning Workloads on
Neuromorphic Hardware [0.11744028458220425]
Several edge deep learning hardware accelerators have been released that specifically focus on reducing the power and area consumed by deep neural networks (DNNs)
Spiked neural networks (SNNs) which operate on discrete time-series data have been shown to achieve substantial power reductions when deployed on specialized neuromorphic event-based/asynchronous hardware.
In this work, we provide a general guide to converting pre-trained DNNs into SNNs while also presenting techniques to improve the deployment of converted SNNs on neuromorphic hardware.
arXiv Detail & Related papers (2022-10-10T20:27:19Z) - Designing and Training of Lightweight Neural Networks on Edge Devices
using Early Halting in Knowledge Distillation [16.74710649245842]
This paper presents a novel approach for designing and training lightweight Deep Neural Networks (DNN) on edge devices.
The approach considers the available storage, processing speed, and allowable maximum processing time.
We introduce a novel early halting technique, which preserves network resources.
arXiv Detail & Related papers (2022-09-30T16:18:24Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - L2ight: Enabling On-Chip Learning for Optical Neural Networks via
Efficient in-situ Subspace Optimization [10.005026783940682]
Silicon-photonics-based optical neural network (ONN) is a promising hardware platform that could represent a paradigm shift in efficient AI.
In this work, we propose a closed-loop ONN on-chip learning framework L2ight to enable scalable ONN mapping and efficient in-situ learning.
arXiv Detail & Related papers (2021-10-27T22:53:47Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z) - Dynamic Hard Pruning of Neural Networks at the Edge of the Internet [11.605253906375424]
Dynamic Hard Pruning (DynHP) technique incrementally prunes the network during training.
DynHP enables a tunable size reduction of the final neural network and reduces the NN memory occupancy during training.
Freed memory is reused by a emphdynamic batch sizing approach to counterbalance the accuracy degradation caused by the hard pruning strategy.
arXiv Detail & Related papers (2020-11-17T10:23:28Z) - Optimizing Memory Placement using Evolutionary Graph Reinforcement
Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces.
We train and validate our approach directly on the Intel NNP-I chip for inference.
We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.