Data-driven Test Generation for Fuzzing AI Compiler
- URL: http://arxiv.org/abs/2601.17450v1
- Date: Sat, 24 Jan 2026 12:56:40 GMT
- Title: Data-driven Test Generation for Fuzzing AI Compiler
- Authors: Qingchao Shen,
- Abstract summary: We present a unified data-driven testing framework that addresses stage-specific challenges in AI compilers.<n> OPERA migrates tests for AI libraries to test various operator conversion logic in the model loading stage.<n> OATest synthesizes diverse optimization-aware computational graphs for testing high-level optimizations.<n>HarmONY generates and mutates diverse low-level IR seeds to generate hardware-optimization-aware tests.
- Score: 0.4441866681085516
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Artificial Intelligence (AI) compilers are critical for efficiently deploying AI models across diverse hardware platforms. However, they remain prone to bugs that can compromise both compiler reliability and model correctness. Thus, ensuring the quality of AI compilers is crucial. In this work, we present a unified data-driven testing framework that systematically addresses stage-specific challenges in AI compilers. Specifically, OPERA migrates tests for AI libraries to test various operator conversion logic in the model loading stage. OATest synthesizes diverse optimization-aware computational graphs for testing high-level optimizations. HARMONY generates and mutates diverse low-level IR seeds to generate hardware-optimization-aware tests for testing low-level optimizations. Together, these techniques provide a comprehensive, stage-aware framework that enhances testing coverage and effectiveness, detecting 266 previously unknown bugs in four widely used AI compilers.
Related papers
- Optimization-Aware Test Generation for Deep Learning Compilers [18.99078574014009]
OATest is a novel approach for synthesizing optimization-aware computational graphs.<n>It can detect more bugs and achieve higher code coverage in TVM and ONNXRutimes.<n> OATest uncovers 58 previously unknown bugs, 36 of which have been confirmed or fixed by developers.
arXiv Detail & Related papers (2025-11-24T09:27:59Z) - QiMeng-NeuComBack: Self-Evolving Translation from IR to Assembly Code [52.66657751895655]
Large Language Models (LLMs) offer a compelling new paradigm: Neural Compilation.<n>This paper introduces NeuComBack, a novel benchmark dataset specifically designed for IR-to-assembly compilation.<n>We propose a self-evolving prompt optimization method that enables LLMs to evolve their internal prompt strategies.
arXiv Detail & Related papers (2025-11-03T03:20:26Z) - Do AI models help produce verified bug fixes? [62.985237003585674]
Large Language Models are used to produce corrections to software bugs.<n>This paper investigates how programmers use Large Language Models to complement their own skills.<n>The results are a first step towards a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.
arXiv Detail & Related papers (2025-07-21T17:30:16Z) - Impact of Code Context and Prompting Strategies on Automated Unit Test Generation with Modern General-Purpose Large Language Models [0.0]
Generative AI is gaining increasing attention in software engineering.<n>Unit tests constitute the majority of test cases and are often schematic.<n>This paper investigates the impact of code context and prompting strategies on the quality and adequacy of unit tests.
arXiv Detail & Related papers (2025-07-18T11:23:17Z) - Compiler Optimization Testing Based on Optimization-Guided Equivalence Transformations [3.2987550056134873]
We propose a metamorphic testing approach inspired by compiler optimizations.<n>Our approach first employs tailored code construction strategies to generate input programs that satisfy optimization conditions.<n>By comparing the outputs of pre- and post-transformation programs, this approach effectively identifies incorrect optimization bugs.
arXiv Detail & Related papers (2025-04-06T01:37:57Z) - Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks.<n>However, improvement is plateauing due to the exhaustion of readily available high-quality data.<n>We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z) - UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance [65.01483640267885]
Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet code generation remains a major challenge.<n>We introduce UnitCoder, a systematic pipeline leveraging model-generated unit tests to guide and validate the code generation process.<n>Our work presents a scalable approach that leverages model-generated unit tests to guide the synthesis of high-quality code data from pre-training corpora.
arXiv Detail & Related papers (2025-02-17T05:37:02Z) - An Empirical Study on Automatically Detecting AI-Generated Source Code: How Far Are We? [8.0988059417354]
We propose a range of approaches to improve the performance of AI-generated code detection.
Our best model outperforms state-of-the-art AI-generated code detector (GPTSniffer) and achieves an F1 score of 82.55.
arXiv Detail & Related papers (2024-11-06T22:48:18Z) - CompilerDream: Learning a Compiler World Model for General Code Optimization [58.87557583347996]
We introduce CompilerDream, a model-based reinforcement learning approach to general code optimization.<n>It comprises a compiler world model that accurately simulates the intrinsic properties of optimization passes and an agent trained on this model to produce effective optimization strategies.<n>It excels across diverse datasets, surpassing LLVM's built-in optimizations and other state-of-the-art methods in both settings of value prediction and end-to-end code optimization.
arXiv Detail & Related papers (2024-04-24T09:20:33Z) - Learning Performance-Improving Code Edits [107.21538852090208]
We introduce a framework for adapting large language models (LLMs) to high-level program optimization.
First, we curate a dataset of performance-improving edits made by human programmers of over 77,000 competitive C++ programming submission pairs.
For prompting, we propose retrieval-based few-shot prompting and chain-of-thought, and for finetuning, these include performance-conditioned generation and synthetic data augmentation based on self-play.
arXiv Detail & Related papers (2023-02-15T18:59:21Z) - Effective Random Test Generation for Deep Learning Compilers [16.065653480978092]
Isra is a domain-specific constraint solver that resolves the constraints from the semantic specifications without backtracking.<n>We implement and apply our approach to three popular real-world deep learning compilers including TVM, Glow, and a commercial compiler named SophGo.<n>Isra is more effective than the state-of-the-art approaches and the baseline approaches on constructing valid test inputs for compiler-bug detection.
arXiv Detail & Related papers (2023-02-02T03:00:36Z) - PolyDL: Polyhedral Optimizations for Creation of High Performance DL
primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives.
We develop novel data reuse analysis algorithms using the polyhedral model.
We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.