Related papers: Configuring Test Generators using Bug Reports: A Case Study of GCC Compiler and Csmith

Configuring Test Generators using Bug Reports: A Case Study of GCC Compiler and Csmith

URL: http://arxiv.org/abs/2012.10662v2
Date: Thu, 18 Mar 2021 12:36:38 GMT
Title: Configuring Test Generators using Bug Reports: A Case Study of GCC Compiler and Csmith
Authors: Md Rafiqul Islam Rabin and Mohammad Amin Alipour
Abstract summary: This paper uses the code snippets in the bug reports to guide the test generation. We evaluate this approach on eight versions of GCC. We find that our approach provides higher coverage and triggers more miscompilation failures than the state-of-the-art test generation techniques for GCC.
Score: 2.1016374925364616
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The correctness of compilers is instrumental in the safety and reliability of other software systems, as bugs in compilers can produce executables that do not reflect the intent of programmers. Such errors are difficult to identify and debug. Random test program generators are commonly used in testing compilers, and they have been effective in uncovering bugs. However, the problem of guiding these test generators to produce test programs that are more likely to find bugs remains challenging. In this paper, we use the code snippets in the bug reports to guide the test generation. The main idea of this work is to extract insights from the bug reports about the language features that are more prone to inadequate implementation and using the insights to guide the test generators. We use the GCC C compiler to evaluate the effectiveness of this approach. In particular, we first cluster the test programs in the GCC bugs reports based on their features. We then use the centroids of the clusters to compute configurations for Csmith, a popular test generator for C compilers. We evaluated this approach on eight versions of GCC and found that our approach provides higher coverage and triggers more miscompilation failures than the state-of-the-art test generation techniques for GCC.

Related papers

CrashFixer: A crash resolution agent for the Linux kernel [58.152358195983155]
This work builds upon kGym, which shares a benchmark for system-level Linux kernel bugs and a platform to run experiments on the Linux kernel. This paper introduces CrashFixer, the first LLM-based software repair agent that is applicable to Linux kernel bugs.
arXiv Detail & Related papers (2025-04-29T04:18:51Z)
RAG-Based Fuzzing of Cross-Architecture Compilers [0.8302146576157498]
OneAPI is an open standard that supports cross-architecture software development with minimal effort from developers. OneAPI brings DPC++ and C++ compilers which need to be thoroughly tested to verify their correctness, reliability, and security. This paper proposes a large-language model (LLM)-based compiler fuzzing tool that integrates the concept of retrieval-augmented generation (RAG)
arXiv Detail & Related papers (2025-04-11T20:46:52Z)
Finding Missed Code Size Optimizations in Compilers using LLMs [1.90019787465083]
We develop a novel testing approach which combines large language models with a series of differential testing strategies. Our approach requires fewer than 150 lines of code to implement. To date we have reported 24 confirmed bugs in production compilers.
arXiv Detail & Related papers (2024-12-31T21:47:46Z)
Commit0: Library Generation from Scratch [77.38414688148006]
Commit0 is a benchmark that challenges AI agents to write libraries from scratch. Agents are provided with a specification document outlining the library's API as well as a suite of interactive unit tests. Commit0 also offers an interactive environment where models receive static analysis and execution feedback on the code they generate.
arXiv Detail & Related papers (2024-12-02T18:11:30Z)
Evolutionary Generative Fuzzing for Differential Testing of the Kotlin Compiler [14.259471945857431]
We investigate the effectiveness of differential testing in finding bugs within the Kotlin compilers developed at JetBrains. We propose a black-box generative approach that creates input programs for the K1 and K2 compilers. Our case study shows that the proposed approach effectively detects bugs in K1 and K2; these bugs have been confirmed and (some) fixed by JetBrains developers.
arXiv Detail & Related papers (2024-01-12T16:01:12Z)
Weak Memory Demands Model-based Compiler Testing [0.0]
A compiler bug arises if the behaviour of a compiled concurrent program, as allowed by its architecture memory model, is not a behaviour permitted by the source program under its source model. We observe that processor implementations are increasingly exploiting the behaviour of relaxed architecture models.
arXiv Detail & Related papers (2024-01-12T15:50:32Z)
DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs) It covers four major bug categories and 18 minor types in C++, Java, and Python. We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z)
Compiler Testing With Relaxed Memory Models [0.0]
We present the T'el'echat compiler testing tool for concurrent programs. T'el'echat compiles a concurrent C/C++ program and compares source and compiled program behaviours.
arXiv Detail & Related papers (2023-10-18T21:24:26Z)
Dcc --help: Generating Context-Aware Compiler Error Explanations with Large Language Models [53.04357141450459]
dcc --help was deployed to our CS1 and CS2 courses, with 2,565 students using the tool over 64,000 times in ten weeks. We found that the LLM-generated explanations were conceptually accurate in 90% of compile-time and 75% of run-time cases, but often disregarded the instruction not to provide solutions in code.
arXiv Detail & Related papers (2023-08-23T02:36:19Z)
Directed Test Program Generation for JIT Compiler Bug Localization [3.626013617212667]
Bug localization techniques for Just-in-Time (JIT) compilers are based on analyzing the execution behaviors of the target JIT compiler on a set of test programs generated for this purpose. This paper proposes a novel technique for automatic test program generation for JIT compiler bug localization.
arXiv Detail & Related papers (2023-07-17T22:43:02Z)
A Survey of Modern Compiler Fuzzing [0.0]
This survey provides a summary of the research efforts for understanding and addressing compilers defects. It covers researchers investigation and expertise on compilers bugs, such as their symptoms and root causes. In addition, it covers researchers efforts in designing fuzzing techniques, including constructing test programs and designing test oracles.
arXiv Detail & Related papers (2023-06-12T06:03:51Z)
A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems. static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models. We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z)
Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation. We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z)
Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers. We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z)
Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it. Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.