Compiling ONNX Neural Network Models Using MLIR
- URL: http://arxiv.org/abs/2008.08272v2
- Date: Thu, 1 Oct 2020 01:15:28 GMT
- Title: Compiling ONNX Neural Network Models Using MLIR
- Authors: Tian Jin, Gheorghe-Teodor Bercea, Tung D. Le, Tong Chen, Gong Su,
Haruki Imai, Yasushi Negishi, Anh Leu, Kevin O'Brien, Kiyokuni Kawachiya, and
Alexandre E. Eichenberger
- Abstract summary: We present a preliminary report on our onnx-mlir compiler, which generates code for the inference of deep neural network models.
Onnx-mlir relies on the Multi-Level Intermediate Representation (MLIR) infrastructure recently integrated in the LLVM project.
- Score: 51.903932262028235
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural network models are becoming increasingly popular and have been
used in various tasks such as computer vision, speech recognition, and natural
language processing. Machine learning models are commonly trained in a
resource-rich environment and then deployed in a distinct environment such as
high availability machines or edge devices. To assist the portability of
models, the open-source community has proposed the Open Neural Network Exchange
(ONNX) standard. In this paper, we present a high-level, preliminary report on
our onnx-mlir compiler, which generates code for the inference of deep neural
network models described in the ONNX format. Onnx-mlir is an open-source
compiler implemented using the Multi-Level Intermediate Representation (MLIR)
infrastructure recently integrated in the LLVM project. Onnx-mlir relies on the
MLIR concept of dialects to implement its functionality. We propose here two
new dialects: (1) an ONNX specific dialect that encodes the ONNX standard
semantics, and (2) a loop-based dialect to provide for a common lowering point
for all ONNX dialect operations. Each intermediate representation facilitates
its own characteristic set of graph-level and loop-based optimizations
respectively. We illustrate our approach by following several models through
the proposed representations and we include some early optimization work and
performance results.
Related papers
- CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models [59.91221728187576]
This paper introduces the CMU Linguistic Linguistic Backend, an open-source framework that simplifies model deployment and continuous human-in-the-loop fine-tuning of NLP models.
CMULAB enables users to leverage the power of multilingual models to quickly adapt and extend existing tools for speech recognition, OCR, translation, and syntactic analysis to new languages.
arXiv Detail & Related papers (2024-04-03T02:21:46Z) - Towards a World-English Language Model for On-Device Virtual Assistants [5.743958545444472]
We combine regional variants of English to build a World English'' NNLM for on-device VAs.
We find that adapter modules are more effective in modeling dialects than specializing entire sub-networks.
arXiv Detail & Related papers (2024-03-27T17:31:39Z) - Exploiting Multilingualism in Low-resource Neural Machine Translation
via Adversarial Learning [3.2258463207097017]
Generative Adversarial Networks (GAN) offer a promising approach for Neural Machine Translation (NMT)
In GAN, similar to bilingual models, multilingual NMT only considers one reference translation for each sentence during model training.
This article proposes Denoising Adversarial Auto-encoder-based Sentence Interpolation (DAASI) approach to perform sentence computation.
arXiv Detail & Related papers (2023-03-31T12:34:14Z) - QONNX: Representing Arbitrary-Precision Quantized Neural Networks [49.10245225120615]
We present extensions to the Open Neural Network Exchange (ONNX) intermediate representation format to represent arbitrary-precision quantized neural networks.
We first introduce support for low precision quantization in existing ONNX-based quantization formats by leveraging integer clipping.
We then introduce a novel higher-level ONNX format called quantized ONNX (QONNX) that introduces three new operators -- Quant, BipolarQuant, and Trunc.
arXiv Detail & Related papers (2022-06-15T13:18:00Z) - Language Models are General-Purpose Interfaces [109.45478241369655]
We propose to use language models as a general-purpose interface to various foundation models.
A collection of pretrained encoders perceive diverse modalities (such as vision, and language)
We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders.
arXiv Detail & Related papers (2022-06-13T17:34:22Z) - NeuralLog: Natural Language Inference with Joint Neural and Logical
Reasoning [6.795509403707242]
We propose an inference framework called NeuralLog, which utilizes both a monotonicity-based logical inference engine and a neural network language model for phrase alignment.
Our framework models the NLI task as a classic search problem and uses the beam search algorithm to search for optimal inference paths.
Experiments show that our joint logic and neural inference system improves accuracy on the NLI task and can achieve state-of-art accuracy on the SICK and MED datasets.
arXiv Detail & Related papers (2021-05-29T01:02:40Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z) - Speaker Recognition using SincNet and X-Vector Fusion [8.637110868126546]
We propose an innovative approach to perform speaker recognition by fusing two recently introduced deep neural networks (DNNs) namely - SincNet and X-Celeb1.
arXiv Detail & Related papers (2020-04-05T14:44:14Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.