ML-driven Hardware Cost Model for MLIR
- URL: http://arxiv.org/abs/2302.11405v1
- Date: Tue, 14 Feb 2023 11:32:47 GMT
- Title: ML-driven Hardware Cost Model for MLIR
- Authors: Dibyendu Das and Sandya Mannarswamy
- Abstract summary: We develop a machine learning-based cost model for high-level MLIR.
By considering the incoming MLIR as a text input a la NLP models we can apply well-known techniques from modern NLP research.
We show that these models can provide reasonably good estimates with low error bounds for various hardware characteristics of interest.
- Score: 1.2987894327817158
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: During early optimization passes, compilers must make predictions for
machine-dependent characteristics such as execution unit utilization, number of
register spills, latency, throughput etc. to generate better code. Often a
hand-written static/analytical hardware cost model is built into the compiler.
However, the need for more sophisticated and varied predictions has become more
pronounced with the development of deep learning compilers which need to
optimize dataflow graphs. Such compilers usually employ a much higher level
MLIR form as an IR representation before lowering to traditional LLVM-IR. A
static/analytical cost model in such a scenario is cumbersome and error prone
as the opcodes represent very high level algebraic/arithmetic operations.
Hence, we develop a machine learning-based cost model for high-level MLIR which
can predict different target variables of interest such as CPU/GPU/xPU
utilization, instructions executed, register usage etc. By considering the
incoming MLIR as a text input a la NLP models we can apply well-known
techniques from modern NLP research to help predict hardware characteristics
more accurately. We expect such precise ML-driven hardware cost models to guide
our deep learning compiler in graph level optimizations around operator fusion,
local memory allocation, kernel scheduling etc. as well as in many kernel-level
optimizations such as loop interchange, LICM and unroll. We report early
work-in -progress results of developing such models on high-level MLIR
representing dataflow graphs emitted by Pytorch/Tensorflow-like frameworks as
well as lower-level dialects like affine. We show that these models can provide
reasonably good estimates with low error bounds for various hardware
characteristics of interest and can be a go-to mechanism for hardware cost
modelling in the future.
Related papers
- DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution [114.61347672265076]
Development of MLLMs for real-world robots is challenging due to the typically limited computation and memory capacities available on robotic platforms.
We propose a Dynamic Early-Exit Framework for Robotic Vision-Language-Action Model (DeeR) that automatically adjusts the size of the activated MLLM.
DeeR demonstrates significant reductions in computational costs of LLM by 5.2-6.5x and GPU memory of LLM by 2-6x without compromising performance.
arXiv Detail & Related papers (2024-11-04T18:26:08Z) - MIREncoder: Multi-modal IR-based Pretrained Embeddings for Performance Optimizations [6.919817502555546]
In this paper, we propose MIREncoder, a Multi-modal IR-based Auto-Encoder that can be pre-trained to generate a learned embedding space.
A multi-modal approach enables us to better extract features from compilable programs.
Our evaluations will show that our proposed approach can outperform the state of the art while reducing overhead.
arXiv Detail & Related papers (2024-07-02T13:00:19Z) - Cheaply Evaluating Inference Efficiency Metrics for Autoregressive
Transformer APIs [66.30706841821123]
Large language models (LLMs) power many state-of-the-art systems in natural language processing.
LLMs are extremely computationally expensive, even at inference time.
We propose a new metric for comparing inference efficiency across models.
arXiv Detail & Related papers (2023-05-03T21:51:42Z) - ParaGraph: Weighted Graph Representation for Performance Optimization of
HPC Kernels [1.304892050913381]
We introduce a new graph-based program representation for parallel applications that extends the Abstract Syntax Tree.
We evaluate our proposed representation by training a Graph Neural Network (GNN) to predict the runtime of an OpenMP code region.
Results show that our approach is indeed effective and has normalized RMSE as low as 0.004 to at most 0.01 in its runtime predictions.
arXiv Detail & Related papers (2023-04-07T05:52:59Z) - Towards Optimal VPU Compiler Cost Modeling by using Neural Networks to
Infer Hardware Performances [58.720142291102135]
'VPUNN' is a neural network-based cost model trained on low-level task profiling.
It consistently outperforms the state-of-the-art cost modeling in Intel's line of VPU processors.
arXiv Detail & Related papers (2022-05-09T22:48:39Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Deep Data Flow Analysis [14.583644439728895]
ProGraML is a portable representation of whole-program semantics for deep learning.
We benchmark current and future learning techniques for compiler analyses.
We show that, using ProGraML, standard analyses can be learned and improved performance on downstream compiler optimization tasks.
arXiv Detail & Related papers (2020-11-21T03:29:14Z) - A Tensor Compiler for Unified Machine Learning Prediction Serving [8.362773007171118]
Machine Learning (ML) adoption in the enterprise requires simpler and more efficient software infrastructure.
Model scoring is a primary contributor to infrastructure complexity and cost as models are trained once but used many times.
We propose HUMMINGBIRD, a novel approach to model scoring that compiles featurization operators and traditional ML models into a small set of tensor operations.
arXiv Detail & Related papers (2020-10-09T21:02:47Z) - Predictive Coding Approximates Backprop along Arbitrary Computation
Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents.
Our models perform equivalently to backprop on challenging machine learning benchmarks.
Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z) - PolyDL: Polyhedral Optimizations for Creation of High Performance DL
primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives.
We develop novel data reuse analysis algorithms using the polyhedral model.
We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z) - Towards High Performance, Portability, and Productivity: Lightweight
Augmented Neural Networks for Performance Prediction [0.0]
We propose lightweight augmented neural networks for arbitrary combinations of kernel-variant- hardware.
We are able to obtain a low MAPE of 3%, significantly outperforming traditional feed-forward neural networks.
Our variant-selection approach can be used in Halide implementations to obtain up to 1.7x speedup over Halide's auto-scheduler.
arXiv Detail & Related papers (2020-03-17T02:19:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.