Fuzzing MLIR Compilers with Custom Mutation Synthesis
- URL: http://arxiv.org/abs/2404.16947v2
- Date: Tue, 27 Aug 2024 16:08:52 GMT
- Title: Fuzzing MLIR Compilers with Custom Mutation Synthesis
- Authors: Ben Limpanukorn, Jiyuan Wang, Hong Jin Kang, Eric Zitong Zhou, Miryung Kim,
- Abstract summary: We develop a new test generator called SYNTHFUZZ that combines grammar-based fuzzing with custom synthesis mutation.
It obviates the need to manually define custom mutation operators for each dialect.
Our evaluation shows that SYNTHFUZZ on average improves MLIR dialect pair coverage by 1.75 times, which increases branch coverage by 1.22 times.
- Score: 6.617861009996863
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Compiler technologies in deep learning and domain-specific hardware acceleration are increasingly adopting extensible compiler frameworks such as Multi-Level Intermediate Representation (MLIR) to facilitate more efficient development. With MLIR, compiler developers can easily define their own custom IRs in the form of MLIR dialects. However, the diversity and rapid evolution of such custom IRs make it impractical to manually write a custom test generator for each dialect. To address this problem, we design a new test generator called SYNTHFUZZ that combines grammar-based fuzzing with custom mutation synthesis. The key essence of SYNTHFUZZ is two fold: (1) It automatically infers parameterized context-dependent custom mutations from existing test cases. (2) It then concretizes the mutation's content depending on the target context and reduces the chance of inserting invalid edits by performing k-ancestor and pre(post)fix matching. SYNTHFUZZ obviates the need to manually define custom mutation operators for each dialect. We compare SYNTHFUZZ to three baselines: Grammarinator, MLIRSmith, and NeuRI. We conduct this comprehensive comparison on four different MLIR projects. Each project defines a new set of MLIR dialects where manually writing a custom test generator would take weeks of effort. Our evaluation shows that SYNTHFUZZ on average improves MLIR dialect pair coverage by 1.75 times, which increases branch coverage by 1.22 times. Further, we show that our context dependent custom mutation increases the proportion of valid tests by up to 1.11 times, indicating that SYNTHFUZZ correctly concretizes its parameterized mutations with respect to the target context. Parameterization of the mutations reduces the fraction of tests violating the base MLIR constraints by 0.57 times, increasing the time spent fuzzing dialect-specific code.
Related papers
- Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective [50.261681681643076]
We propose a novel metric called SemVarEffect and a benchmark named SemVarBench to evaluate the causality between semantic variations in inputs and outputs in text-to-image synthesis.
Our work establishes an effective evaluation framework that advances the T2I synthesis community's exploration of human instruction understanding.
arXiv Detail & Related papers (2024-10-14T08:45:35Z) - Localizing Factual Inconsistencies in Attributable Text Generation [91.981439746404]
We introduce QASemConsistency, a new formalism for localizing factual inconsistencies in attributable text generation.
We first demonstrate the effectiveness of the QASemConsistency methodology for human annotation.
We then implement several methods for automatically detecting localized factual inconsistencies.
arXiv Detail & Related papers (2024-10-09T22:53:48Z) - Training Language Models on Synthetic Edit Sequences Improves Code Synthesis [33.13471417703669]
Large language models (LLMs) autoresourcedly synthesize programs in a single pass.
We develop a synthetic data generation algorithm called LintSeq to generate high-quality code edit data.
We show that edit sequence finetuned models produce more diverse programs than baselines synthesis.
arXiv Detail & Related papers (2024-10-03T17:57:22Z) - Fix the Tests: Augmenting LLMs to Repair Test Cases with Static Collector and Neural Reranker [9.428021853841296]
We propose SYNTER, a novel approach to automatically repair obsolete test cases via precise and concise TROCtxs construction.
With the augmentation of constructed TROCtxs, hallucinations are reduced by 57.1%.
arXiv Detail & Related papers (2024-07-04T04:24:43Z) - Nearest Neighbor Speculative Decoding for LLM Generation and Attribution [87.3259169631789]
Nearest Speculative Decoding (NEST) is capable of incorporating real-world text spans of arbitrary length into the LM generations and providing attribution to their sources.
NEST significantly enhances the generation quality and attribution rate of the base LM across a variety of knowledge-intensive tasks.
In addition, NEST substantially improves the generation speed, achieving a 1.8x speedup in inference time when applied to Llama-2-Chat 70B.
arXiv Detail & Related papers (2024-05-29T17:55:03Z) - SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation [55.2480439325792]
We study the synthesis of six datasets, covering topic classification, sentiment analysis, tone detection, and humor.
We find that SynthesizRR greatly improves lexical and semantic diversity, similarity to human-written text, and distillation performance.
arXiv Detail & Related papers (2024-05-16T12:22:41Z) - LLMorpheus: Mutation Testing using Large Language Models [7.312170216336085]
This paper presents a technique where a Large Language Model (LLM) is prompted to suggest mutations by asking it what placeholders that have been inserted in source code could be replaced with.
We find LLMorpheus to be capable of producing mutants that resemble existing bugs that cannot be produced by StrykerJS, a state-of-the-art mutation testing tool.
arXiv Detail & Related papers (2024-04-15T17:25:14Z) - MRL Parsing Without Tears: The Case of Hebrew [14.104766026682384]
In morphologically rich languages (MRLs), wheres need to identify multiple lexical units in each token, existing systems suffer in latency and setup complexity.
We present a new "flipped pipeline": decisions are made directly on the whole-token units by expert classifiers, each one dedicated to one specific task.
This blazingly fast approach sets a new SOTA in Hebrew POS tagging and dependency parsing, while also reaching near-SOTA performance on other Hebrew tasks.
arXiv Detail & Related papers (2024-03-11T17:54:33Z) - Contrastive Instruction Tuning [61.97704869248903]
We propose Contrastive Instruction Tuning to maximize the similarity between semantically equivalent instruction-instance pairs.
Experiments on the PromptBench benchmark show that CoIN consistently improves LLMs' robustness to unseen instructions with variations across character, word, sentence, and semantic levels by an average of +2.5% in accuracy.
arXiv Detail & Related papers (2024-02-17T00:09:32Z) - Paraformer: Fast and Accurate Parallel Transformer for
Non-autoregressive End-to-End Speech Recognition [62.83832841523525]
We propose a fast and accurate parallel transformer, termed Paraformer.
It accurately predicts the number of output tokens and extract hidden variables.
It can attain comparable performance to the state-of-the-art AR transformer, with more than 10x speedup.
arXiv Detail & Related papers (2022-06-16T17:24:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.