CoCoPIE XGen: A Full-Stack AI-Oriented Optimizing Framework
- URL: http://arxiv.org/abs/2206.10620v1
- Date: Tue, 21 Jun 2022 14:10:22 GMT
- Title: CoCoPIE XGen: A Full-Stack AI-Oriented Optimizing Framework
- Authors: Xiaofeng Li, Bin Ren, Xipeng Shen, Yanzhi Wang
- Abstract summary: There is a growing demand for shifting the delivery of AI capability from data centers on the cloud to edge or end devices.
The shift has however been hampered by the large growing gap between DNN computing demands and the computing power on edge or end devices.
This article presents the design of XGen, an optimizing framework for DNN designed to bridge the gap.
- Score: 40.53707613126131
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is a growing demand for shifting the delivery of AI capability from
data centers on the cloud to edge or end devices, exemplified by the fast
emerging real-time AI-based apps running on smartphones, AR/VR devices,
autonomous vehicles, and various IoT devices. The shift has however been
seriously hampered by the large growing gap between DNN computing demands and
the computing power on edge or end devices. This article presents the design of
XGen, an optimizing framework for DNN designed to bridge the gap. XGen takes
cross-cutting co-design as its first-order consideration. Its full-stack
AI-oriented optimizations consist of a number of innovative optimizations at
every layer of the DNN software stack, all designed in a cooperative manner.
The unique technology makes XGen able to optimize various DNNs, including those
with an extreme depth (e.g., BERT, GPT, other transformers), and generate code
that runs several times faster than those from existing DNN frameworks, while
delivering the same level of accuracy.
Related papers
- Latency optimized Deep Neural Networks (DNNs): An Artificial Intelligence approach at the Edge using Multiprocessor System on Chip (MPSoC) [1.949471382288103]
Edge computing (AI at Edge) in mobile devices is one of the optimized approaches for addressing this requirement.
In this work, the possibilities and challenges of implementing a low-latency and power-optimized smart mobile system are examined.
Various performance aspects and implementation feasibilities of Neural Networks (NNs) on both embedded FPGA edge devices are discussed.
arXiv Detail & Related papers (2024-07-16T11:51:41Z) - Fast GraspNeXt: A Fast Self-Attention Neural Network Architecture for
Multi-task Learning in Computer Vision Tasks for Robotic Grasping on the Edge [80.88063189896718]
High architectural and computational complexity can result in poor suitability for deployment on embedded devices.
Fast GraspNeXt is a fast self-attention neural network architecture tailored for embedded multi-task learning in computer vision tasks for robotic grasping.
arXiv Detail & Related papers (2023-04-21T18:07:14Z) - SOL: Reducing the Maintenance Overhead for Integrating Hardware Support
into AI Frameworks [0.7614628596146599]
AI frameworks such as Theano, Caffe, Chainer, CNTK, MxNet, PyTorch, DL4J provide a high level scripting API.
Less mainstream CPU, GPU or accelerator vendors need to put in a high effort to get their hardware supported by these frameworks.
NEC Laboratories Europe started developing the SOL AI Optimization project already years ago.
arXiv Detail & Related papers (2022-05-19T08:40:46Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - E3NE: An End-to-End Framework for Accelerating Spiking Neural Networks
with Emerging Neural Encoding on FPGAs [6.047137174639418]
End-to-end framework E3NE automates the generation of efficient SNN inference logic for FPGAs.
E3NE uses less than 50% of hardware resources and 20% less power, while reducing the latency by an order of magnitude.
arXiv Detail & Related papers (2021-11-19T04:01:19Z) - Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning
and Compiler Optimization [56.3111706960878]
High-end mobile platforms serve as primary computing devices for a wide range of Deep Neural Network (DNN) applications.
constrained computation and storage resources on these devices pose significant challenges for real-time inference executions.
We propose a set of hardware-friendly structured model pruning and compiler optimization techniques to accelerate DNN executions on mobile devices.
arXiv Detail & Related papers (2020-04-22T03:18:23Z) - CoCoPIE: Making Mobile AI Sweet As PIE --Compression-Compilation
Co-Design Goes a Long Way [39.63763140268978]
It is possible to enable real-time artificial intelligence on mainstream end devices without special hardware.
CoCoPIE is a software framework that holds numerous records on mobile AI.
arXiv Detail & Related papers (2020-03-14T20:53:05Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.