Related papers: LLM-Vectorizer: LLM-based Verified Loop Vectorizer

LLM-Vectorizer: LLM-based Verified Loop Vectorizer

URL: http://arxiv.org/abs/2406.04693v1
Date: Fri, 7 Jun 2024 07:04:26 GMT
Title: LLM-Vectorizer: LLM-based Verified Loop Vectorizer
Authors: Jubi Taneja, Avery Laird, Cong Yan, Madan Musuvathi, Shuvendu K. Lahiri,
Abstract summary: Large-language models (LLMs) can generate vectorized code from scalar programs that process individual array elements. LLMs are capable of producing high performance vectorized code with run-time speedup ranging from 1.1x to 9.4x. Our approach is able to verify 38.2% of vectorizations as correct on the TSVC benchmark dataset.
Score: 12.048697450464935
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vectorization is a powerful optimization technique that significantly boosts the performance of high performance computing applications operating on large data arrays. Despite decades of research on auto-vectorization, compilers frequently miss opportunities to vectorize code. On the other hand, writing vectorized code manually using compiler intrinsics is still a complex, error-prone task that demands deep knowledge of specific architecture and compilers. In this paper, we evaluate the potential of large-language models (LLMs) to generate vectorized (Single Instruction Multiple Data) code from scalar programs that process individual array elements. We propose a novel finite-state machine multi-agents based approach that harnesses LLMs and test-based feedback to generate vectorized code. Our findings indicate that LLMs are capable of producing high performance vectorized code with run-time speedup ranging from 1.1x to 9.4x as compared to the state-of-the-art compilers such as Intel Compiler, GCC, and Clang. To verify the correctness of vectorized code, we use Alive2, a leading bounded translation validation tool for LLVM IR. We describe a few domain-specific techniques to improve the scalability of Alive2 on our benchmark dataset. Overall, our approach is able to verify 38.2% of vectorizations as correct on the TSVC benchmark dataset.

Related papers

LLM-Assisted Translation of Legacy FORTRAN Codes to C++: A Cross-Platform Study [38.73914653312889]
Large Language Models (LLMs) are increasingly being leveraged for generating and translating scientific computer codes. Here, we studied the applicability of LLM-based translation of Fortran to C++ as a step towards building an agentic-workflow. We statistically quantified the compilation accuracy of the translated C++ codes, measured the similarity of the LLM translated code to the human translated C++ code, and statistically quantified the output similarity of the Fortran to C++ translation.
arXiv Detail & Related papers (2025-04-21T20:34:37Z)
An Adaptive Vector Index Partitioning Scheme for Low-Latency RAG Pipeline [0.6445605125467574]
Retrieval Augmented Generation (RAG) systems enhance response quality by integrating Large Language Models (LLMs) with vector databases. Existing optimizations for vector search and LLM serving have largely been developed in isolation. This paper introduces VectorLiteRAG, an optimized vector index partitioning mechanism designed for RAG systems.
arXiv Detail & Related papers (2025-04-11T19:18:41Z)
VecTrans: LLM Transformation Framework for Better Auto-vectorization on High-performance CPU [17.263612093919885]
VecTrans is a framework that leverages large language models to enhance compiler-based code vectorization. VecTrans successfully vectorizes 23 cases (46%) and achieves an average speedup of 2.02x.
arXiv Detail & Related papers (2025-03-25T08:39:35Z)
Vector-ICL: In-context Learning with Continuous Vector Representations [75.96920867382859]
Large language models (LLMs) have shown remarkable in-context learning capabilities on textual data. We explore whether these capabilities can be extended to continuous vectors from diverse domains, obtained from black-box pretrained encoders. In particular, we find that pretraining projectors with general language modeling objectives enables Vector-ICL.
arXiv Detail & Related papers (2024-10-08T02:25:38Z)
In-Context Learning State Vector with Inner and Momentum Optimization [23.33921300777915]
Large Language Models (LLMs) have exhibited an impressive ability to perform In-Context Learning (ICL) from only a few examples. Recent works have indicated that the functions learned by ICL can be represented through compressed vectors derived from the transformer. This paper presents a comprehensive analysis of these compressed vectors, drawing parallels to the parameters trained with gradient descent, and the concept of state vector.
arXiv Detail & Related papers (2024-04-17T10:19:15Z)
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components. CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks. FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization. Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z)
Sobol Sequence Optimization for Hardware-Efficient Vector Symbolic Architectures [2.022279594514036]
Hyperdimensional computing (HDC) is an emerging computing paradigm with significant promise for efficient and robust learning. objects are encoded with high-dimensional vector symbolic sequences called hypervectors. The quality of hypervectors, defined by their distribution and independence, directly impacts the performance of HDC systems.
arXiv Detail & Related papers (2023-11-17T01:48:07Z)
A Compiler from Array Programs to Vectorized Homomorphic Encryption [1.6216324006136673]
Homomorphic encryption (HE) is a practical approach to secure computation over encrypted data. We present Viaduct-HE, a compiler generates efficient vectorized HE programs. Viaduct-HE can generate both the operations and complex data layouts required for efficient HE programs.
arXiv Detail & Related papers (2023-11-10T16:00:00Z)
Adapting Language Models to Compress Contexts [71.98287002918941]
Transformer-based language models (LMs) are powerful and widely-applicable tools, but their usefulness is constrained by a finite context window. We propose to adapt pre-trained LMs into AutoCompressors, which are capable of compressing long contexts into compact summary vectors. We fine-tune OPT and Llama-2 models on sequences of up to 30,720 tokens and show that AutoCompressors can utilize long contexts to improve perplexity.
arXiv Detail & Related papers (2023-05-24T06:42:44Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels. We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z)
Inference with Reference: Lossless Acceleration of Large Language Models [97.04200102556551]
LLMA is an accelerator to speed up Large Language Model (LLM) inference with references. It is motivated by the observation that there are abundant identical text spans between the decoding result by an LLM and the reference that is available in many real world scenarios.
arXiv Detail & Related papers (2023-04-10T09:55:14Z)
A Case Study of LLVM-Based Analysis for Optimizing SIMD Code Generation [0.0]
This paper presents a methodology for using LLVM-based tools to tune the DCA++ application that targets the new ARM A64FX processor. By applying these code changes, codespeed was increased by 1.98X and 78 GFlops were achieved on the A64FX processor.
arXiv Detail & Related papers (2021-06-27T22:38:16Z)
PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives. We develop novel data reuse analysis algorithms using the polyhedral model. We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.