Weak Memory Demands Model-based Compiler Testing
- URL: http://arxiv.org/abs/2401.09474v1
- Date: Fri, 12 Jan 2024 15:50:32 GMT
- Title: Weak Memory Demands Model-based Compiler Testing
- Authors: Luke Geeson
- Abstract summary: A compiler bug arises if the behaviour of a compiled concurrent program, as allowed by its architecture memory model, is not a behaviour permitted by the source program under its source model.
We observe that processor implementations are increasingly exploiting the behaviour of relaxed architecture models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A compiler bug arises if the behaviour of a compiled concurrent program, as
allowed by its architecture memory model, is not a behaviour permitted by the
source program under its source model. One might reasonably think that most
compiler bugs have been found in the decade since the introduction of the C/C++
memory model. We observe that processor implementations are increasingly
exploiting the behaviour of relaxed architecture models. As such, compiled
programs may exhibit bugs not seen on older hardware. To account for this we
require model-based compiler testing.
While this observation is not surprising, its implications are broad.
Compilers and their testing tools will need to be updated to follow hardware
relaxations, concurrent test generators will need to be improved, and
assumptions of prior work will need revisiting. We explore these ideas using a
compiler toolchain bug we reported in LLVM.
Related papers
- Towards Understanding the Bugs in Solidity Compiler [11.193701473232851]
This paper presents the first systematic study on 533 Solidity compiler bugs.
We examine their characteristics (including symptoms, root causes, and distribution) and their triggering test cases.
To study the limitations of Solidity compiler fuzzers, we evaluate three Solidity compiler fuzzers.
arXiv Detail & Related papers (2024-07-08T14:22:50Z) - KGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution [59.20933707301566]
Large Language Models (LLMs) are consistently improving at increasingly realistic software engineering (SE) tasks.
In real-world software stacks, significant SE effort is spent developing foundational system software like the Linux kernel.
To evaluate if ML models are useful while developing such large-scale systems-level software, we introduce kGym and kBench.
arXiv Detail & Related papers (2024-07-02T21:44:22Z) - DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs)
It covers four major bug categories and 18 minor types in C++, Java, and Python.
We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z) - PTE: Axiomatic Semantics based Compiler Testing [7.203331838793759]
We propose an axiomatic semantics based approach for testing compilers, called PTE.
The idea is to incrementally develop a set of axioms'' capturing anecdotes of the language semantics in the form of emph(textbfprecondition, textbftransformation, textbfexpectation''
arXiv Detail & Related papers (2024-01-02T04:50:47Z) - Compiler Testing With Relaxed Memory Models [0.0]
We present the T'el'echat compiler testing tool for concurrent programs.
T'el'echat compiles a concurrent C/C++ program and compares source and compiled program behaviours.
arXiv Detail & Related papers (2023-10-18T21:24:26Z) - Generative Models as a Complex Systems Science: How can we make sense of
large language model behavior? [75.79305790453654]
Coaxing out desired behavior from pretrained models, while avoiding undesirable ones, has redefined NLP.
We argue for a systematic effort to decompose language model behavior into categories that explain cross-task performance.
arXiv Detail & Related papers (2023-07-31T22:58:41Z) - A Survey of Modern Compiler Fuzzing [0.0]
This survey provides a summary of the research efforts for understanding and addressing compilers defects.
It covers researchers investigation and expertise on compilers bugs, such as their symptoms and root causes.
In addition, it covers researchers efforts in designing fuzzing techniques, including constructing test programs and designing test oracles.
arXiv Detail & Related papers (2023-06-12T06:03:51Z) - A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems.
static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models.
We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z) - HDCC: A Hyperdimensional Computing compiler for classification on
embedded systems and high-performance computing [58.720142291102135]
This work introduces the name compiler, the first open-source compiler that translates high-level descriptions of HDC classification methods into optimized C code.
name is designed like a modern compiler, featuring an intuitive and descriptive input language, an intermediate representation (IR), and a retargetable backend.
To substantiate these claims, we conducted experiments with HDCC on several of the most popular datasets in the HDC literature.
arXiv Detail & Related papers (2023-04-24T19:16:03Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - Configuring Test Generators using Bug Reports: A Case Study of GCC
Compiler and Csmith [2.1016374925364616]
This paper uses the code snippets in the bug reports to guide the test generation.
We evaluate this approach on eight versions of GCC.
We find that our approach provides higher coverage and triggers more miscompilation failures than the state-of-the-art test generation techniques for GCC.
arXiv Detail & Related papers (2020-12-19T11:25:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.