SOL: Reducing the Maintenance Overhead for Integrating Hardware Support
into AI Frameworks
- URL: http://arxiv.org/abs/2205.10357v1
- Date: Thu, 19 May 2022 08:40:46 GMT
- Title: SOL: Reducing the Maintenance Overhead for Integrating Hardware Support
into AI Frameworks
- Authors: Nicolas Weber
- Abstract summary: AI frameworks such as Theano, Caffe, Chainer, CNTK, MxNet, PyTorch, DL4J provide a high level scripting API.
Less mainstream CPU, GPU or accelerator vendors need to put in a high effort to get their hardware supported by these frameworks.
NEC Laboratories Europe started developing the SOL AI Optimization project already years ago.
- Score: 0.7614628596146599
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The increased interest in Artificial Intelligence (AI) raised the need for
highly optimized and sophisticated AI frameworks. Starting with the Lua-based
Torch many frameworks have emerged over time, such as Theano, Caffe, Chainer,
CNTK, MxNet, PyTorch, DL4J, or TensorFlow. All of these provide a high level
scripting API that allows users to easily design neural networks and run these
on various kinds of hardware. What the user usually does not see is the high
effort put into these frameworks to provide peak execution performance. While
mainstream CPUs and GPUs have the "luxury" to have a wide spread user base in
the open source community, less mainstream CPU, GPU or accelerator vendors need
to put in a high effort to get their hardware supported by these frameworks.
This includes not only the development of highly efficient compute libraries
such as CUDNN, OneDNN or VEDNN but also supporting an ever growing number of
simpler compute operations such as summation and multiplications. Each of these
frameworks, nowadays, supports several hundred of unique operations, with
tensors of various sizes, shapes and data types, which end up in thousands of
compute kernels required for each device type. And the number of operations
keeps increasing.
That is why NEC Laboratories Europe started developing the SOL AI
Optimization project already years ago, to deliver optimal performance to users
while keeping the maintenance burden minimal.
Related papers
- Spyx: A Library for Just-In-Time Compiled Optimization of Spiking Neural
Networks [0.08965418284317034]
Spiking Neural Networks (SNNs) offer to enhance energy efficiency through a reduced and low-power hardware footprint.
This paper introduces Spyx, a new and lightweight SNN simulation and optimization library designed in JAX.
arXiv Detail & Related papers (2024-02-29T09:46:44Z) - Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels.
We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion.
We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z) - InceptionNeXt: When Inception Meets ConvNeXt [167.61042926444105]
We build a series of networks, namely IncepitonNeXt, which not only enjoy high throughputs but also maintain competitive performance.
InceptionNeXt achieves 1.6x higher training throughputs than ConvNeX-T, as well as attains 0.2% top-1 accuracy improvement on ImageNet-1K.
arXiv Detail & Related papers (2023-03-29T17:59:58Z) - CoCoPIE XGen: A Full-Stack AI-Oriented Optimizing Framework [40.53707613126131]
There is a growing demand for shifting the delivery of AI capability from data centers on the cloud to edge or end devices.
The shift has however been hampered by the large growing gap between DNN computing demands and the computing power on edge or end devices.
This article presents the design of XGen, an optimizing framework for DNN designed to bridge the gap.
arXiv Detail & Related papers (2022-06-21T14:10:22Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - Bring Your Own Codegen to Deep Learning Compiler [8.87545486816377]
This paper proposes an open source framework that enables users to only concentrate on the development of their proprietary code generation tools.
Our framework provides users flexible and easy-to-use interfaces to partition their models into segments that can be executed on "the best" processors.
arXiv Detail & Related papers (2021-05-03T17:22:25Z) - PolyDL: Polyhedral Optimizations for Creation of High Performance DL
primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives.
We develop novel data reuse analysis algorithms using the polyhedral model.
We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z) - SOL: Effortless Device Support for AI Frameworks without Source Code
Changes [1.030051577369649]
We introduce SOL, an AI acceleration that provides a hardware abstraction layer that allows us to transparently support heterogeneous hardware.
As a proof of concept, we implemented SOL for PyTorch with three backends: CPU, GPU and vector processors.
arXiv Detail & Related papers (2020-03-24T07:03:09Z) - PolyScientist: Automatic Loop Transformations Combined with Microkernels
for Optimization of Deep Learning Primitives [55.79741270235602]
We develop a hybrid solution to the development of deep learning kernels.
We use the advanced polyhedral technology to automatically tune the outer loops for performance.
arXiv Detail & Related papers (2020-02-06T08:02:34Z) - Towards High Performance Java-based Deep Learning Frameworks [0.22940141855172028]
Modern cloud services have set the demand for fast and efficient data processing.
This demand is common among numerous application domains, such as deep learning, data mining, and computer vision.
In this paper we have employed TornadoVM, a state-of-the-art programming framework to transparently accelerate Deep Netts; a Java-based deep learning framework.
arXiv Detail & Related papers (2020-01-13T13:03:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.