SOL: Effortless Device Support for AI Frameworks without Source Code
Changes
- URL: http://arxiv.org/abs/2003.10688v1
- Date: Tue, 24 Mar 2020 07:03:09 GMT
- Title: SOL: Effortless Device Support for AI Frameworks without Source Code
Changes
- Authors: Nicolas Weber and Felipe Huici
- Abstract summary: We introduce SOL, an AI acceleration that provides a hardware abstraction layer that allows us to transparently support heterogeneous hardware.
As a proof of concept, we implemented SOL for PyTorch with three backends: CPU, GPU and vector processors.
- Score: 1.030051577369649
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern high performance computing clusters heavily rely on accelerators to
overcome the limited compute power of CPUs. These supercomputers run various
applications from different domains such as simulations, numerical applications
or artificial intelligence (AI). As a result, vendors need to be able to
efficiently run a wide variety of workloads on their hardware. In the AI domain
this is in particular exacerbated by the existence of a number of popular
frameworks (e.g, PyTorch, TensorFlow, etc.) that have no common code base, and
can vary in functionality. The code of these frameworks evolves quickly, making
it expensive to keep up with all changes and potentially forcing developers to
go through constant rounds of upstreaming. In this paper we explore how to
provide hardware support in AI frameworks without changing the framework's
source code in order to minimize maintenance overhead. We introduce SOL, an AI
acceleration middleware that provides a hardware abstraction layer that allows
us to transparently support heterogeneous hardware. As a proof of concept, we
implemented SOL for PyTorch with three backends: CPUs, GPUs and vector
processors.
Related papers
- RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph [63.87660059104077]
We present RepoGraph, a plug-in module that manages a repository-level structure for modern AI software engineering solutions.
RepoGraph substantially boosts the performance of all systems, leading to a new state-of-the-art among open-source frameworks.
arXiv Detail & Related papers (2024-10-03T05:45:26Z) - OpenHands: An Open Platform for AI Software Developers as Generalist Agents [109.8507367518992]
We introduce OpenHands, a platform for the development of AI agents that interact with the world in similar ways to a human developer.
We describe how the platform allows for the implementation of new agents, safe interaction with sandboxed environments for code execution, and incorporation of evaluation benchmarks.
arXiv Detail & Related papers (2024-07-23T17:50:43Z) - Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI.
As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios.
This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z) - Bridging the Gap Between Domain-specific Frameworks and Multiple Hardware Devices [2.9694650164958802]
We propose a systematic methodology that effectively bridges the gap between domain-specific frameworks and multiple hardware devices.
The framework supports deep learning, classical machine learning, and data analysis across X86, ARM, RISC-V, IoT devices, and GPU.
It outperforms existing solutions like scikit-learn, hummingbird, Spark, and pandas, achieving impressive speedups.
arXiv Detail & Related papers (2024-05-21T04:24:47Z) - Using the Abstract Computer Architecture Description Language to Model
AI Hardware Accelerators [77.89070422157178]
Manufacturers of AI-integrated products face a critical challenge: selecting an accelerator that aligns with their product's performance requirements.
The Abstract Computer Architecture Description Language (ACADL) is a concise formalization of computer architecture block diagrams.
In this paper, we demonstrate how to use the ACADL to model AI hardware accelerators, use their ACADL description to map DNNs onto them, and explain the timing simulation semantics to gather performance results.
arXiv Detail & Related papers (2024-01-30T19:27:16Z) - Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels.
We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion.
We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z) - SOL: Reducing the Maintenance Overhead for Integrating Hardware Support
into AI Frameworks [0.7614628596146599]
AI frameworks such as Theano, Caffe, Chainer, CNTK, MxNet, PyTorch, DL4J provide a high level scripting API.
Less mainstream CPU, GPU or accelerator vendors need to put in a high effort to get their hardware supported by these frameworks.
NEC Laboratories Europe started developing the SOL AI Optimization project already years ago.
arXiv Detail & Related papers (2022-05-19T08:40:46Z) - Extending Python for Quantum-Classical Computing via Quantum
Just-in-Time Compilation [78.8942067357231]
Python is a popular programming language known for its flexibility, usability, readability, and focus on developer productivity.
We present a language extension to Python that enables heterogeneous quantum-classical computing via a robust C++ infrastructure for quantum just-in-time compilation.
arXiv Detail & Related papers (2021-05-10T21:11:21Z) - Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization,
Quantizations, Memory Optimizations, and More [26.748770505062378]
SLIDE is a C++ implementation of a sparse hash table based back-propagation.
We show how SLIDE's computations allow for a unique possibility of vectorization via AVX (Advanced Vector Extensions-512)
Our experiments are focused on large (hundreds of millions of parameters) recommendation and NLP models.
arXiv Detail & Related papers (2021-03-06T02:13:43Z) - How deep the machine learning can be [0.0]
Machine learning is mostly based on the conventional computing (processors)
This paper attempts to review some of the caveats, especially concerning scaling the computing performance of the AI solutions.
arXiv Detail & Related papers (2020-05-02T16:06:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.