A Case Study of LLVM-Based Analysis for Optimizing SIMD Code Generation
- URL: http://arxiv.org/abs/2106.14332v1
- Date: Sun, 27 Jun 2021 22:38:16 GMT
- Title: A Case Study of LLVM-Based Analysis for Optimizing SIMD Code Generation
- Authors: Joseph Huber, Weile Wei, Giorgis Georgakoudis, Johannes Doerfert,
Oscar Hernandez
- Abstract summary: This paper presents a methodology for using LLVM-based tools to tune the DCA++ application that targets the new ARM A64FX processor.
By applying these code changes, codespeed was increased by 1.98X and 78 GFlops were achieved on the A64FX processor.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper presents a methodology for using LLVM-based tools to tune the
DCA++ (dynamical clusterapproximation) application that targets the new ARM
A64FX processor. The goal is to describethe changes required for the new
architecture and generate efficient single instruction/multiple data(SIMD)
instructions that target the new Scalable Vector Extension instruction set.
During manualtuning, the authors used the LLVM tools to improve code
parallelization by using OpenMP SIMD,refactored the code and applied
transformation that enabled SIMD optimizations, and ensured thatthe correct
libraries were used to achieve optimal performance. By applying these code
changes, codespeed was increased by 1.98X and 78 GFlops were achieved on the
A64FX processor. The authorsaim to automatize parts of the efforts in the
OpenMP Advisor tool, which is built on top of existingand newly introduced LLVM
tooling.
Related papers
- Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - A Reinforcement Learning Environment for Automatic Code Optimization in the MLIR Compiler [0.10923877073891444]
We introduce the first RL environment for the MLIR compiler, dedicated to facilitating MLIR compiler research.
We also propose a novel formulation of the action space as a product of simpler action subspaces, enabling more efficient and effective optimizations.
arXiv Detail & Related papers (2024-09-17T10:49:45Z) - Meta Large Language Model Compiler: Foundation Models of Compiler Optimization [21.161784011956126]
Large Language Models (LLMs) have demonstrated remarkable capabilities across a variety of software engineering and coding tasks.
However, their application in the domain of code and compiler optimization remains underexplored.
We introduce Meta Large Language Model Compiler (LLM Compiler), a suite of robust, openly available, pre-trained models for code optimization tasks.
arXiv Detail & Related papers (2024-06-27T21:47:48Z) - LLM-Vectorizer: LLM-based Verified Loop Vectorizer [12.048697450464935]
Large-language models (LLMs) can generate vectorized code from scalar programs that process individual array elements.
LLMs are capable of producing high performance vectorized code with run-time speedup ranging from 1.1x to 9.4x.
Our approach is able to verify 38.2% of vectorizations as correct on the TSVC benchmark dataset.
arXiv Detail & Related papers (2024-06-07T07:04:26Z) - Compiler generated feedback for Large Language Models [3.86901256759401]
We introduce a novel paradigm in compiler optimization powered by Large Language Models with compiler feedback to optimize the code size of LLVM assembly.
The model takes unoptimized LLVM IR as input and produces optimized IR, the best optimization passes, and instruction counts of both unoptimized and optimized IRs.
arXiv Detail & Related papers (2024-03-18T23:25:13Z) - Extreme Compression of Large Language Models via Additive Quantization [59.3122859349777]
Our algorithm, called AQLM, generalizes the classic Additive Quantization (AQ) approach for information retrieval.
We provide fast GPU and CPU implementations of AQLM for token generation, which enable us to match or outperform optimized FP16 implementations for speed.
arXiv Detail & Related papers (2024-01-11T18:54:44Z) - Tackling the Matrix Multiplication Micro-kernel Generation with Exo [0.5517652814152908]
We present a step-by-step procedure for generating a dedicated micro-kernel for each new hardware.
Our solution also improves the portability of the generated code, since a hardware target is fully specified by a concise library-based description of its instructions.
arXiv Detail & Related papers (2023-10-26T14:09:57Z) - Learning Performance-Improving Code Edits [107.21538852090208]
We introduce a framework for adapting large language models (LLMs) to high-level program optimization.
First, we curate a dataset of performance-improving edits made by human programmers of over 77,000 competitive C++ programming submission pairs.
For prompting, we propose retrieval-based few-shot prompting and chain-of-thought, and for finetuning, these include performance-conditioned generation and synthetic data augmentation based on self-play.
arXiv Detail & Related papers (2023-02-15T18:59:21Z) - Learning to Superoptimize Real-world Programs [79.4140991035247]
We propose a framework to learn to superoptimize real-world programs by using neural sequence-to-sequence models.
We introduce the Big Assembly benchmark, a dataset consisting of over 25K real-world functions mined from open-source projects in x86-64 assembly.
arXiv Detail & Related papers (2021-09-28T05:33:21Z) - Enabling Retargetable Optimizing Compilers for Quantum Accelerators via
a Multi-Level Intermediate Representation [78.8942067357231]
We present a multi-level quantum-classical intermediate representation (IR) that enables an optimizing, retargetable, ahead-of-time compiler.
We support the entire gate-based OpenQASM 3 language and provide custom extensions for common quantum programming patterns and improved syntax.
Our work results in compile times that are 1000x faster than standard Pythonic approaches, and 5-10x faster than comparative standalone quantum language compilers.
arXiv Detail & Related papers (2021-09-01T17:29:47Z) - PolyDL: Polyhedral Optimizations for Creation of High Performance DL
primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives.
We develop novel data reuse analysis algorithms using the polyhedral model.
We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.