Related papers: Compiler Toolchains for Deep Learning Workloads on Embedded Platforms

Compiler Toolchains for Deep Learning Workloads on Embedded Platforms

URL: http://arxiv.org/abs/2104.04576v1
Date: Mon, 8 Mar 2021 13:54:25 GMT
Title: Compiler Toolchains for Deep Learning Workloads on Embedded Platforms
Authors: Max Sponner, Bernd Waschneck and Akash Kumar
Abstract summary: It is necessary to convert the framework-specific network representations into executable code for embedded platforms. This paper consists of two parts: The first section is made up of a survey and benchmark of the available open source deep learning compiler toolchains. The second part explores the implementation and evaluation of a compilation flow for such a heterogeneous device.
Score: 2.5744053804694893
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As the usage of deep learning becomes increasingly popular in mobile and embedded solutions, it is necessary to convert the framework-specific network representations into executable code for these embedded platforms. This paper consists of two parts: The first section is made up of a survey and benchmark of the available open source deep learning compiler toolchains, which focus on the capabilities and performance of the individual solutions in regard to targeting embedded devices and microcontrollers that are combined with a dedicated accelerator in a heterogeneous fashion. The second part explores the implementation and evaluation of a compilation flow for such a heterogeneous device and reuses one of the existing toolchains to demonstrate the necessary steps for hardware developers that plan to build a software flow for their own hardware.

Related papers

Benchmarking Predictive Coding Networks -- Made Simple [48.652114040426625]
We first propose a library called PCX, whose focus lies on performance and simplicity. We use PCX to implement a large set of benchmarks for the community to use for their experiments.
arXiv Detail & Related papers (2024-07-01T10:33:44Z)
ResyDuo: Combining data models and CF-based recommender systems to develop Arduino projects [4.844354192596123]
This paper proposes an initial prototype, called ResyDuo, to assist Arduino developers by providing two different artifacts. ResyDuo retrieves hardware components by using tags or existing Arduino projects stored on the ProjectHub repository. The system can eventually retrieve corresponding software libraries based on the identified hardware devices.
arXiv Detail & Related papers (2023-08-26T08:21:31Z)
CodeTF: One-stop Transformer Library for State-of-the-art Code LLM [72.1638273937025]
We present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence. Our library supports a collection of pretrained Code LLM models and popular code benchmarks. We hope CodeTF is able to bridge the gap between machine learning/generative AI and software engineering.
arXiv Detail & Related papers (2023-05-31T05:24:48Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels. We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z)
Towards Diverse Binary Segmentation via A Simple yet General Gated Network [71.19503376629083]
We propose a simple yet general gated network (GateNet) to tackle binary segmentation tasks. With the help of multi-level gate units, the valuable context information from the encoder can be selectively transmitted to the decoder. We introduce a "Fold" operation to improve the atrous convolution and form a novel folded atrous convolution.
arXiv Detail & Related papers (2023-03-18T11:26:36Z)
Flashlight: Enabling Innovation in Tools for Machine Learning [50.63188263773778]
We introduce Flashlight, an open-source library built to spur innovation in machine learning tools and systems. We see Flashlight as a tool enabling research that can benefit widely used libraries downstream and bring machine learning and systems researchers closer together.
arXiv Detail & Related papers (2022-01-29T01:03:29Z)
On Joint Learning for Solving Placement and Routing in Chip Design [70.30640973026415]
We propose a joint learning method by DeepPlace for the placement of macros and standard cells. We also develop a joint learning approach via reinforcement learning to fulfill both macro placement and routing, which is called DeepPR. Our method can effectively learn from experience and also provides intermediate placement for the post standard cell placement, within few hours for training.
arXiv Detail & Related papers (2021-10-30T11:41:49Z)
Bring Your Own Codegen to Deep Learning Compiler [8.87545486816377]
This paper proposes an open source framework that enables users to only concentrate on the development of their proprietary code generation tools. Our framework provides users flexible and easy-to-use interfaces to partition their models into segments that can be executed on "the best" processors.
arXiv Detail & Related papers (2021-05-03T17:22:25Z)
MLIR: A Compiler Infrastructure for the End of Moore's Law [14.795080852112083]
MLIR aims to address software fragmentation, improve compilation for heterogeneous hardware, and significantly reduce the cost of building domain specific compilers. MLIR facilitates the design and implementation of code generators, translators and translators at different levels of abstraction.
arXiv Detail & Related papers (2020-02-25T17:24:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.