Fuzzing Deep Learning Compilers with HirGen
- URL: http://arxiv.org/abs/2208.02193v5
- Date: Wed, 21 Jun 2023 06:19:33 GMT
- Title: Fuzzing Deep Learning Compilers with HirGen
- Authors: Haoyang Ma, Qingchao Shen, Yongqiang Tian, Junjie Chen, Shing-Chi
Cheung
- Abstract summary: HirGen is an automated testing technique that aims to effectively expose coding mistakes in the optimization of high-level IR.
HirGen has successfully detected 21 bugs that occur at TVM, with 17 bugs confirmed and 12 fixed.
Our experiment results show that HirGen can detect 10 crashes and inconsistencies that cannot be detected by the baselines in 48 hours.
- Score: 12.068825031724229
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep Learning (DL) compilers are widely adopted to optimize advanced DL
models for efficient deployment on diverse hardware. Their quality has profound
effect on the quality of compiled DL models. A recent bug study shows that the
optimization of high-level intermediate representation (IR) is the most
error-prone compilation stage. Bugs in this stage are accountable for 44.92% of
the whole collected ones. However, existing testing techniques do not consider
high-level optimization related features (e.g. high-level IR), and are
therefore weak in exposing bugs at this stage. To bridge this gap, we propose
HirGen, an automated testing technique that aims to effectively expose coding
mistakes in the optimization of high-level IR. The design of HirGen includes 1)
three coverage criteria to generate diverse and valid computational graphs; 2)
full use of high-level IRs language features to generate diverse IRs; 3) three
test oracles inspired from both differential testing and metamorphic testing.
HirGen has successfully detected 21 bugs that occur at TVM, with 17 bugs
confirmed and 12 fixed. Further, we construct four baselines using the
state-of-the-art DL compiler fuzzers that can cover the high-level optimization
stage. Our experiment results show that HirGen can detect 10 crashes and
inconsistencies that cannot be detected by the baselines in 48 hours. We
further validate the usefulness of our proposed coverage criteria and test
oracles in evaluation.
Related papers
- Exploring and Lifting the Robustness of LLM-powered Automated Program Repair with Metamorphic Testing [31.165102332393964]
Large language model-powered Automated Program Repair (LAPR) techniques have achieved state-of-the-art bug-fixing performance.
It is crucial to conduct robustness testing on LAPR techniques before their practical deployment.
We propose MT-LAPR, a Metamorphic Testing framework exclusively for LAPR techniques.
arXiv Detail & Related papers (2024-10-10T01:14:58Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods.
The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.
We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - Constrained C-Test Generation via Mixed-Integer Programming [55.28927994487036]
This work proposes a novel method to generate C-Tests; a form of cloze tests (a gap filling exercise) where only the last part of a word is turned into a gap.
In contrast to previous works that only consider varying the gap size or gap placement to achieve locally optimal solutions, we propose a mixed-integer programming (MIP) approach.
We publish our code, model, and collected data consisting of 32 English C-Tests with 20 gaps each (totaling 3,200 individual gap responses) under an open source license.
arXiv Detail & Related papers (2024-04-12T21:35:21Z) - Evolutionary Generative Fuzzing for Differential Testing of the Kotlin
Compiler [14.259471945857431]
We investigate the effectiveness of differential testing in finding bugs within the Kotlin compilers developed at JetBrains.
We propose a black-box generative approach that creates input programs for the K1 and K2 compilers.
Our case study shows that the proposed approach effectively detects bugs in K1 and K2; these bugs have been confirmed and (some) fixed by JetBrains developers.
arXiv Detail & Related papers (2024-01-12T16:01:12Z) - DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs)
It covers four major bug categories and 18 minor types in C++, Java, and Python.
We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z) - Accelerating LLaMA Inference by Enabling Intermediate Layer Decoding via
Instruction Tuning with LITE [62.13435256279566]
Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks.
However, their large size makes their inference slow and computationally expensive.
We show that it enables these layers to acquire 'good' generation ability without affecting the generation ability of the final layer.
arXiv Detail & Related papers (2023-10-28T04:07:58Z) - HDCC: A Hyperdimensional Computing compiler for classification on
embedded systems and high-performance computing [58.720142291102135]
This work introduces the name compiler, the first open-source compiler that translates high-level descriptions of HDC classification methods into optimized C code.
name is designed like a modern compiler, featuring an intuitive and descriptive input language, an intermediate representation (IR), and a retargetable backend.
To substantiate these claims, we conducted experiments with HDCC on several of the most popular datasets in the HDC literature.
arXiv Detail & Related papers (2023-04-24T19:16:03Z) - Finding Deep-Learning Compilation Bugs with NNSmith [20.082492391396933]
We propose a new fuzz testing approach for finding bugs in deep-learning compilers.
Our core approach uses (i) light-weight operator specifications to generate diverse yet valid models, (ii) a gradient-based search process, and (iii) differential testing to identify bugs.
We implemented this approach in NNSmith which has found 65 new bugs in the last seven months for TVM,RT, ONNXRuntime, and PyTorch. Of these 52 have been confirmed and 44 have been fixed by maintainers.
arXiv Detail & Related papers (2022-07-26T17:39:51Z) - Coverage-Guided Tensor Compiler Fuzzing with Joint IR-Pass Mutation [20.519361342905775]
We propose Tzer, a practical fuzzing technique for the widely used TVM tensor compiler.
Our results show that Tzer substantially outperforms existing fuzzing techniques on tensor compiler testing.
To date, Tzer has detected 49 previously unknown bugs for TVM, with 37 bugs confirmed and 25 bugs fixed.
arXiv Detail & Related papers (2022-02-21T01:48:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.