Deep Data Flow Analysis
- URL: http://arxiv.org/abs/2012.01470v1
- Date: Sat, 21 Nov 2020 03:29:14 GMT
- Title: Deep Data Flow Analysis
- Authors: Chris Cummins, Hugh Leather, Zacharias Fisches, Tal Ben-Nun, Torsten
Hoefler, Michael O'Boyle
- Abstract summary: ProGraML is a portable representation of whole-program semantics for deep learning.
We benchmark current and future learning techniques for compiler analyses.
We show that, using ProGraML, standard analyses can be learned and improved performance on downstream compiler optimization tasks.
- Score: 14.583644439728895
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Compiler architects increasingly look to machine learning when building
heuristics for compiler optimization. The promise of automatic heuristic
design, freeing the compiler engineer from the complex interactions of program,
architecture, and other optimizations, is alluring. However, most machine
learning methods cannot replicate even the simplest of the abstract
interpretations of data flow analysis that are critical to making good
optimization decisions. This must change for machine learning to become the
dominant technology in compiler heuristics.
To this end, we propose ProGraML - Program Graphs for Machine Learning - a
language-independent, portable representation of whole-program semantics for
deep learning. To benchmark current and future learning techniques for compiler
analyses we introduce an open dataset of 461k Intermediate Representation (IR)
files for LLVM, covering five source programming languages, and 15.4M
corresponding data flow results. We formulate data flow analysis as an MPNN and
show that, using ProGraML, standard analyses can be learned, yielding improved
performance on downstream compiler optimization tasks.
Related papers
- LLM-based Optimization of Compound AI Systems: A Survey [64.39860384538338]
In a compound AI system, components such as an LLM call, a retriever, a code interpreter, or tools are interconnected.
Recent advancements enable end-to-end optimization of these parameters using an LLM.
This paper presents a survey of the principles and emerging trends in LLM-based optimization of compound AI systems.
arXiv Detail & Related papers (2024-10-21T18:06:25Z) - CompilerDream: Learning a Compiler World Model for General Code Optimization [58.87557583347996]
We introduce CompilerDream, a model-based reinforcement learning approach to general code optimization.
It comprises a compiler world model that accurately simulates the intrinsic properties of optimization passes and an agent trained on this model to produce effective optimization strategies.
It excels across diverse datasets, surpassing LLVM's built-in optimizations and other state-of-the-art methods in both settings of value prediction and end-to-end code optimization.
arXiv Detail & Related papers (2024-04-24T09:20:33Z) - Compiler generated feedback for Large Language Models [3.86901256759401]
We introduce a novel paradigm in compiler optimization powered by Large Language Models with compiler feedback to optimize the code size of LLVM assembly.
The model takes unoptimized LLVM IR as input and produces optimized IR, the best optimization passes, and instruction counts of both unoptimized and optimized IRs.
arXiv Detail & Related papers (2024-03-18T23:25:13Z) - Learning Performance-Improving Code Edits [107.21538852090208]
We introduce a framework for adapting large language models (LLMs) to high-level program optimization.
First, we curate a dataset of performance-improving edits made by human programmers of over 77,000 competitive C++ programming submission pairs.
For prompting, we propose retrieval-based few-shot prompting and chain-of-thought, and for finetuning, these include performance-conditioned generation and synthetic data augmentation based on self-play.
arXiv Detail & Related papers (2023-02-15T18:59:21Z) - ML-driven Hardware Cost Model for MLIR [1.2987894327817158]
We develop a machine learning-based cost model for high-level MLIR.
By considering the incoming MLIR as a text input a la NLP models we can apply well-known techniques from modern NLP research.
We show that these models can provide reasonably good estimates with low error bounds for various hardware characteristics of interest.
arXiv Detail & Related papers (2023-02-14T11:32:47Z) - Profile Guided Optimization without Profiles: A Machine Learning
Approach [0.0]
Profile guided optimization is an effective technique for improving the optimization ability of compilers based on dynamic behavior.
We present a novel statistical approach to inferring branch probabilities that improves the performance of programs that are compiled without profile guided optimizations.
arXiv Detail & Related papers (2021-12-24T22:49:21Z) - Learning to Superoptimize Real-world Programs [79.4140991035247]
We propose a framework to learn to superoptimize real-world programs by using neural sequence-to-sequence models.
We introduce the Big Assembly benchmark, a dataset consisting of over 25K real-world functions mined from open-source projects in x86-64 assembly.
arXiv Detail & Related papers (2021-09-28T05:33:21Z) - Instead of Rewriting Foreign Code for Machine Learning, Automatically
Synthesize Fast Gradients [6.09170287691728]
This paper presents Enzyme, a high-performance automatic differentiation (AD) compiler plugin for the LLVM compiler framework.
Enzyme synthesizes gradients for programs written in any language whose compiler targets LLVM intermediate representation (IR)
On a machine-learning focused benchmark suite including Microsoft's ADBench, AD on optimized IR achieves a geometric mean speedup of 4.5x over AD on IR.
arXiv Detail & Related papers (2020-10-04T22:32:51Z) - PolyDL: Polyhedral Optimizations for Creation of High Performance DL
primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives.
We develop novel data reuse analysis algorithms using the polyhedral model.
We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z) - ProGraML: Graph-based Deep Learning for Program Optimization and
Analysis [16.520971531754018]
We introduce ProGraML, a graph-based program representation for machine learning.
ProGraML achieves an average 94.0 F1 score, significantly outperforming the state-of-the-art approaches.
We then apply our approach to two high-level tasks - heterogeneous device mapping and program classification - setting new state-of-the-art performance in both.
arXiv Detail & Related papers (2020-03-23T20:27:00Z) - Multi-layer Optimizations for End-to-End Data Analytics [71.05611866288196]
We introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach.
IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language.
We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and specialization by several orders of magnitude for linear regression and regression tree models over several relational datasets.
arXiv Detail & Related papers (2020-01-10T16:14:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.