Related papers: A Case Study of LLVM-Based Analysis for Optimizing SIMD Code Generation

A Case Study of LLVM-Based Analysis for Optimizing SIMD Code Generation

URL: http://arxiv.org/abs/2106.14332v1
Date: Sun, 27 Jun 2021 22:38:16 GMT
Title: A Case Study of LLVM-Based Analysis for Optimizing SIMD Code Generation
Authors: Joseph Huber, Weile Wei, Giorgis Georgakoudis, Johannes Doerfert, Oscar Hernandez
Abstract summary: This paper presents a methodology for using LLVM-based tools to tune the DCA++ application that targets the new ARM A64FX processor. By applying these code changes, codespeed was increased by 1.98X and 78 GFlops were achieved on the A64FX processor.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This paper presents a methodology for using LLVM-based tools to tune the DCA++ (dynamical clusterapproximation) application that targets the new ARM A64FX processor. The goal is to describethe changes required for the new architecture and generate efficient single instruction/multiple data(SIMD) instructions that target the new Scalable Vector Extension instruction set. During manualtuning, the authors used the LLVM tools to improve code parallelization by using OpenMP SIMD,refactored the code and applied transformation that enabled SIMD optimizations, and ensured thatthe correct libraries were used to achieve optimal performance. By applying these code changes, codespeed was increased by 1.98X and 78 GFlops were achieved on the A64FX processor. The authorsaim to automatize parts of the efforts in the OpenMP Advisor tool, which is built on top of existingand newly introduced LLVM tooling.

Related papers

PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback [78.89596149768458]
Large Language Models (LLMs) are widely adopted for assisting in software development tasks. We propose PerfCodeGen, a training-free framework that enhances the performance of LLM-generated code.
arXiv Detail & Related papers (2024-11-18T06:22:38Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
A Reinforcement Learning Environment for Automatic Code Optimization in the MLIR Compiler [0.10923877073891444]
We introduce the first RL environment for the MLIR compiler, dedicated to facilitating MLIR compiler research. We also propose a novel formulation of the action space as a product of simpler action subspaces, enabling more efficient and effective optimizations.
arXiv Detail & Related papers (2024-09-17T10:49:45Z)
Meta Large Language Model Compiler: Foundation Models of Compiler Optimization [21.161784011956126]
Large Language Models (LLMs) have demonstrated remarkable capabilities across a variety of software engineering and coding tasks. However, their application in the domain of code and compiler optimization remains underexplored. We introduce Meta Large Language Model Compiler (LLM Compiler), a suite of robust, openly available, pre-trained models for code optimization tasks.
arXiv Detail & Related papers (2024-06-27T21:47:48Z)
Should AI Optimize Your Code? A Comparative Study of Classical Optimizing Compilers Versus Current Large Language Models [0.0]
Large Language Models (LLMs) raise intriguing questions about the potential of these AI approaches to revolutionize code optimization. This work aims to answer an essential question for the compiler community: "Can AI-driven models revolutionize the way we approach code optimization?" We present a comparative analysis between three classical optimizing compilers and two recent large language models.
arXiv Detail & Related papers (2024-06-17T23:26:41Z)
LLM-Vectorizer: LLM-based Verified Loop Vectorizer [12.048697450464935]
Large-language models (LLMs) can generate vectorized code from scalar programs that process individual array elements. LLMs are capable of producing high performance vectorized code with run-time speedup ranging from 1.1x to 9.4x. Our approach is able to verify 38.2% of vectorizations as correct on the TSVC benchmark dataset.
arXiv Detail & Related papers (2024-06-07T07:04:26Z)
Compiler generated feedback for Large Language Models [3.86901256759401]
We introduce a novel paradigm in compiler optimization powered by Large Language Models with compiler feedback to optimize the code size of LLVM assembly. The model takes unoptimized LLVM IR as input and produces optimized IR, the best optimization passes, and instruction counts of both unoptimized and optimized IRs.
arXiv Detail & Related papers (2024-03-18T23:25:13Z)
Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers [108.72225067368592]
We propose a novel perspective to investigate the design of large language models (LLMs)-based prompts. We identify two pivotal factors in model parameter learning: update direction and update method. We develop a capable Gradient-inspired Prompt-based GPO.
arXiv Detail & Related papers (2024-02-27T15:05:32Z)
Extreme Compression of Large Language Models via Additive Quantization [59.3122859349777]
Our algorithm, called AQLM, generalizes the classic Additive Quantization (AQ) approach for information retrieval. We provide fast GPU and CPU implementations of AQLM for token generation, which enable us to match or outperform optimized FP16 implementations for speed.
arXiv Detail & Related papers (2024-01-11T18:54:44Z)
Tackling the Matrix Multiplication Micro-kernel Generation with Exo [0.5517652814152908]
We present a step-by-step procedure for generating a dedicated micro-kernel for each new hardware. Our solution also improves the portability of the generated code, since a hardware target is fully specified by a concise library-based description of its instructions.
arXiv Detail & Related papers (2023-10-26T14:09:57Z)
Learning Performance-Improving Code Edits [107.21538852090208]
We introduce a framework for adapting large language models (LLMs) to high-level program optimization. First, we curate a dataset of performance-improving edits made by human programmers of over 77,000 competitive C++ programming submission pairs. For prompting, we propose retrieval-based few-shot prompting and chain-of-thought, and for finetuning, these include performance-conditioned generation and synthetic data augmentation based on self-play.
arXiv Detail & Related papers (2023-02-15T18:59:21Z)
Learning to Superoptimize Real-world Programs [79.4140991035247]
We propose a framework to learn to superoptimize real-world programs by using neural sequence-to-sequence models. We introduce the Big Assembly benchmark, a dataset consisting of over 25K real-world functions mined from open-source projects in x86-64 assembly.
arXiv Detail & Related papers (2021-09-28T05:33:21Z)
Enabling Retargetable Optimizing Compilers for Quantum Accelerators via a Multi-Level Intermediate Representation [78.8942067357231]
We present a multi-level quantum-classical intermediate representation (IR) that enables an optimizing, retargetable, ahead-of-time compiler. We support the entire gate-based OpenQASM 3 language and provide custom extensions for common quantum programming patterns and improved syntax. Our work results in compile times that are 1000x faster than standard Pythonic approaches, and 5-10x faster than comparative standalone quantum language compilers.
arXiv Detail & Related papers (2021-09-01T17:29:47Z)
PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives. We develop novel data reuse analysis algorithms using the polyhedral model. We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.