Related papers: Retrofitting Control Flow Graphs in LLVM IR for Auto Vectorization

Retrofitting Control Flow Graphs in LLVM IR for Auto Vectorization

URL: http://arxiv.org/abs/2510.04890v1
Date: Mon, 06 Oct 2025 15:11:41 GMT
Title: Retrofitting Control Flow Graphs in LLVM IR for Auto Vectorization
Authors: Shihan Fang, Wenxin Zheng,
Abstract summary: We introduce a novel vectorization pipeline featuring two specialized IR extensions: SIR, which encodes high-level structural information, and VIR, which explicitly represents dependencies through data dependency analysis.<n>Our proposed vectorization pipeline achieves significant performance improvements, delivering speedups of up to 53% and 58% compared to LLVM and GCC, respectively.
Score: 0.14323566945483493
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern processors increasingly rely on SIMD instruction sets, such as AVX and RVV, to significantly enhance parallelism and computational performance. However, production-ready compilers like LLVM and GCC often fail to fully exploit available vectorization opportunities due to disjoint vectorization passes and limited extensibility. Although recent attempts in heuristics and intermediate representation (IR) designs have attempted to address these problems, efficiently simplifying control flow analysis and accurately identifying vectorization opportunities remain challenging tasks. To address these issues, we introduce a novel vectorization pipeline featuring two specialized IR extensions: SIR, which encodes high-level structural information, and VIR, which explicitly represents instruction dependencies through data dependency analysis. Leveraging the detailed dependency information provided by VIR, we develop a flexible and extensible vectorization framework. This approach substantially improves interoperability across vectorization passes and expands the search space for identifying isomorphic instructions, ultimately enhancing both the scope and efficiency of automatic vectorization. Experimental evaluations demonstrate that our proposed vectorization pipeline achieves significant performance improvements, delivering speedups of up to 53% and 58% compared to LLVM and GCC, respectively.

Related papers

An LLVM-Based Optimization Pipeline for SPDZ [0.0]
We implement a proof-of-concept LLVM-based optimization pipeline for the SPDZ protocol.<n>Our front end accepts a subset of C with lightweight privacy annotations and lowers it to LLVM IR.<n>Our back end performs data-flow and control-flow analysis on the optimized IR to drive a non-blocking runtime scheduler.
arXiv Detail & Related papers (2025-12-11T20:53:35Z)
GLOW: Graph-Language Co-Reasoning for Agentic Workflow Performance Prediction [51.83437071408662]
We propose GLOW, a unified framework for AW performance prediction.<n>GLOW combines the graph-structure modeling capabilities of GNNs with the reasoning power of LLMs.<n>Experiments on FLORA-Bench show that GLOW outperforms state-of-the-art baselines in prediction accuracy and ranking utility.
arXiv Detail & Related papers (2025-12-11T13:30:46Z)
VecIntrinBench: Benchmarking Cross-Architecture Intrinsic Code Migration for RISC-V Vector [8.59222474360646]
Translating intrinsic functions to RISC-V Vector (RVV) intrinsic functions across architectures is currently a mainstream approach.<n>There is currently no benchmark that comprehensively evaluates the intrinsic migration capabilities for the RVV extension.<n>We propose VecIntrinBench, the first intrinsic benchmark encompassing RVV extensions.
arXiv Detail & Related papers (2025-11-24T08:11:10Z)
IntrinTrans: LLM-based Intrinsic Code Translator for RISC-V Vector [9.678932711610244]
Translating existing vectorized intrinsic code onto RVV intrinsics is a practical and effective approach.<n>Current cross-architecture translation largely relies on manual rewriting, which is time-consuming and error-prone.<n>We present IntrinTrans, a multi-agent approach that utilizes compile-and-test feedback to translate intrinsic code across architectures automatically.
arXiv Detail & Related papers (2025-10-11T08:52:01Z)
Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution [48.7788770680643]
Flash-Searcher is a novel parallel agent reasoning framework.<n>It decomposes complex tasks into subtasks with explicit dependencies, enabling concurrent execution of independent reasoning paths.<n>It achieves 67.7% accuracy on BrowseComp and 83% on xbench-DeepSearch, while reducing agent execution steps by up to 35% compared to current frameworks.
arXiv Detail & Related papers (2025-09-29T17:39:30Z)
GRIL: Knowledge Graph Retrieval-Integrated Learning with Large Language Models [59.72897499248909]
We propose a novel graph retriever trained end-to-end with Large Language Models (LLMs)<n>Within the extracted subgraph, structural knowledge and semantic features are encoded via soft tokens and the verbalized graph, respectively, which are infused into the LLM together.<n>Our approach consistently achieves state-of-the-art performance, validating the strength of joint graph-LLM optimization for complex reasoning tasks.
arXiv Detail & Related papers (2025-09-20T02:38:00Z)
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use [78.29315418819074]
We introduce VerlTool, a unified and modular framework that addresses limitations through systematic design principles.<n>Our framework formalizes ARLT as multi-turn trajectories with multi-modal observation tokens (text/image/video), extending beyond single-turn RLVR paradigms.<n>The modular plugin architecture enables rapid tool integration requiring only lightweight Python definitions.
arXiv Detail & Related papers (2025-09-01T01:45:18Z)
VectorLiteRAG: Latency-Aware and Fine-Grained Resource Partitioning for Efficient RAG [2.0929459605817193]
Retrieval-Augmented Generation (RAG) systems combine vector similarity search with large language models (LLMs) to deliver context-aware responses.<n>We present VectorLiteRAG, a deployment-friendly RAG system that achieves latency-compliant inference without requiring additional hardware resources.
arXiv Detail & Related papers (2025-04-11T19:18:41Z)
VecTrans: Enhancing Compiler Auto-Vectorization through LLM-Assisted Code Transformations [17.974013479973774]
VecTrans is a framework that leverages large language models to enhance compiler-based code vectorization.<n>VecTrans achieves an geomean speedup of 1.77x and successfully vectorizes 24 of 51 test cases.
arXiv Detail & Related papers (2025-03-25T08:39:35Z)
Vector-ICL: In-context Learning with Continuous Vector Representations [75.96920867382859]
Large language models (LLMs) have shown remarkable in-context learning capabilities on textual data.<n>We explore whether these capabilities can be extended to continuous vectors from diverse domains, obtained from black-box pretrained encoders.<n>In particular, we find that pretraining projectors with general language modeling objectives enables Vector-ICL.
arXiv Detail & Related papers (2024-10-08T02:25:38Z)
LLM-Vectorizer: LLM-based Verified Loop Vectorizer [12.048697450464935]
Large-language models (LLMs) can generate vectorized code from scalar programs that process individual array elements. LLMs are capable of producing high performance vectorized code with run-time speedup ranging from 1.1x to 9.4x. Our approach is able to verify 38.2% of vectorizations as correct on the TSVC benchmark dataset.
arXiv Detail & Related papers (2024-06-07T07:04:26Z)
Low-Resolution Self-Attention for Semantic Segmentation [93.30597515880079]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.<n>Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.<n>We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.