Related papers: AVX / NEON Intrinsic Functions: When Should They Be Used?

AVX / NEON Intrinsic Functions: When Should They Be Used?

URL: http://arxiv.org/abs/2601.04922v1
Date: Thu, 08 Jan 2026 13:21:19 GMT
Title: AVX / NEON Intrinsic Functions: When Should They Be Used?
Authors: Théo Boivin, Joeffrey Legaux,
Abstract summary: Cross-configuration benchmark is proposed to explore the capacities and limitations of AVX / NEON intrinsic functions.<n>Main aim is to guide developers to choose when using intrinsic functions, depending on the OS, architecture and/or available compiler.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A cross-configuration benchmark is proposed to explore the capacities and limitations of AVX / NEON intrinsic functions in a generic context of development project, when a vectorisation strategy is required to optimise the code. The main aim is to guide developers to choose when using intrinsic functions, depending on the OS, architecture and/or available compiler. Intrinsic functions were observed highly efficient in conditional branching, with intrinsic version execution time reaching around 5% of plain code execution time. However, intrinsic functions were observed as unnecessary in many cases, as the compilers already well auto-vectorise the code.

Related papers

Understanding Accelerator Compilers via Performance Profiling [1.1841612917872066]
Accelerator design languages (ADLs) are high-level languages that compile to hardware units.<n>We introduce Petal, a cycle-level tool for understanding how the compiler's decisions affect performance.<n>We show that Petal's cycle-level profiles can identify performance problems in existing designs.
arXiv Detail & Related papers (2025-11-24T22:40:11Z)
VecIntrinBench: Benchmarking Cross-Architecture Intrinsic Code Migration for RISC-V Vector [8.59222474360646]
Translating intrinsic functions to RISC-V Vector (RVV) intrinsic functions across architectures is currently a mainstream approach.<n>There is currently no benchmark that comprehensively evaluates the intrinsic migration capabilities for the RVV extension.<n>We propose VecIntrinBench, the first intrinsic benchmark encompassing RVV extensions.
arXiv Detail & Related papers (2025-11-24T08:11:10Z)
Library Liberation: Competitive Performance Matmul Through Compiler-composed Nanokernels [37.00431889602245]
This paper introduces a compilation scheme that automatically generates scalable, high-performance micro Kernels.<n>We implement this technique in an MLIR-based compiler supporting both vector and tile based CPU instructions.<n>Experiments show that the generated nano Kernels are of production-quality, and competitive with state-of-the-art micro Kernel libraries.
arXiv Detail & Related papers (2025-11-14T14:32:28Z)
IntrinTrans: LLM-based Intrinsic Code Translator for RISC-V Vector [9.678932711610244]
Translating existing vectorized intrinsic code onto RVV intrinsics is a practical and effective approach.<n>Current cross-architecture translation largely relies on manual rewriting, which is time-consuming and error-prone.<n>We present IntrinTrans, a multi-agent approach that utilizes compile-and-test feedback to translate intrinsic code across architectures automatically.
arXiv Detail & Related papers (2025-10-11T08:52:01Z)
A Walsh Hadamard Derived Linear Vector Symbolic Architecture [83.27945465029167]
Symbolic Vector Architectures (VSAs) are an approach to developing Neuro-symbolic AI. HLB is designed to have favorable computational efficiency, and efficacy in classic VSA tasks.
arXiv Detail & Related papers (2024-10-30T03:42:59Z)
Breaking Bad: How Compilers Break Constant-Time Implementations [8.771587132463535]
We investigate how compilers break protections introduced by defensive programming techniques.<n>We run a large-scale experiment to see if such compiler-induced issues manifest in state-of-the-art cryptographic libraries.<n>Our study reveals that several compiler-induced secret-dependent operations occur within some of the most highly regarded cryptographic libraries.
arXiv Detail & Related papers (2024-10-17T12:34:02Z)
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z)
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels. We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion. We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z)
QParallel: Explicit Parallelism for Programming Quantum Computers [62.10004571940546]
We present a language extension for parallel quantum programming. QParallel removes ambiguities concerning parallelism in current quantum programming languages. We introduce a tool that guides programmers in the placement of parallel regions by identifying the subroutines that profit most from parallelization.
arXiv Detail & Related papers (2022-10-07T16:35:16Z)
PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives. We develop novel data reuse analysis algorithms using the polyhedral model. We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)
Towards High Performance, Portability, and Productivity: Lightweight Augmented Neural Networks for Performance Prediction [0.0]
We propose lightweight augmented neural networks for arbitrary combinations of kernel-variant- hardware. We are able to obtain a low MAPE of 3%, significantly outperforming traditional feed-forward neural networks. Our variant-selection approach can be used in Halide implementations to obtain up to 1.7x speedup over Halide's auto-scheduler.
arXiv Detail & Related papers (2020-03-17T02:19:54Z)
PolyScientist: Automatic Loop Transformations Combined with Microkernels for Optimization of Deep Learning Primitives [55.79741270235602]
We develop a hybrid solution to the development of deep learning kernels. We use the advanced polyhedral technology to automatically tune the outer loops for performance.
arXiv Detail & Related papers (2020-02-06T08:02:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.