Evolutionary Generative Fuzzing for Differential Testing of the Kotlin
Compiler
- URL: http://arxiv.org/abs/2401.06653v1
- Date: Fri, 12 Jan 2024 16:01:12 GMT
- Title: Evolutionary Generative Fuzzing for Differential Testing of the Kotlin
Compiler
- Authors: Calin Georgescu, Mitchell Olsthoorn, Pouria Derakhshanfar, Marat
Akhin, Annibale Panichella
- Abstract summary: We investigate the effectiveness of differential testing in finding bugs within the Kotlin compilers developed at JetBrains.
We propose a black-box generative approach that creates input programs for the K1 and K2 compilers.
Our case study shows that the proposed approach effectively detects bugs in K1 and K2; these bugs have been confirmed and (some) fixed by JetBrains developers.
- Score: 14.259471945857431
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compiler correctness is a cornerstone of reliable software development.
However, systematic testing of compilers is infeasible, given the vast space of
possible programs and the complexity of modern programming languages. In this
context, differential testing offers a practical methodology as it addresses
the oracle problem by comparing the output of alternative compilers given the
same set of programs as input. In this paper, we investigate the effectiveness
of differential testing in finding bugs within the Kotlin compilers developed
at JetBrains. We propose a black-box generative approach that creates input
programs for the K1 and K2 compilers. First, we build workable models of Kotlin
semantic (semantic interface) and syntactic (enriched context-free grammar)
language features, which are subsequently exploited to generate random code
snippets. Second, we extend random sampling by introducing two genetic
algorithms (GAs) that aim to generate more diverse input programs. Our case
study shows that the proposed approach effectively detects bugs in K1 and K2;
these bugs have been confirmed and (some) fixed by JetBrains developers. While
we do not observe a significant difference w.r.t. the number of defects
uncovered by the different search algorithms, random search and GAs are
complementary as they find different categories of bugs. Finally, we provide
insights into the relationships between the size, complexity, and fault
detection capability of the generated input programs.
Related papers
- Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification.
In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction.
Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z) - AdaCCD: Adaptive Semantic Contrasts Discovery Based Cross Lingual
Adaptation for Code Clone Detection [69.79627042058048]
AdaCCD is a novel cross-lingual adaptation method that can detect cloned codes in a new language without annotations in that language.
We evaluate the cross-lingual adaptation results of AdaCCD by constructing a multilingual code clone detection benchmark consisting of 5 programming languages.
arXiv Detail & Related papers (2023-11-13T12:20:48Z) - Extended Paper: API-driven Program Synthesis for Testing Static Typing
Implementations [11.300829269111627]
We introduce a novel approach for testing static typing implementations based on the concept of API-driven program synthesis.
The idea is to synthesize type-intensive but small and well-typed programs by leveraging and combining application programming interfaces (APIs) derived from existing software libraries.
arXiv Detail & Related papers (2023-11-08T08:32:40Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z) - Directed Test Program Generation for JIT Compiler Bug Localization [3.626013617212667]
Bug localization techniques for Just-in-Time (JIT) compilers are based on analyzing the execution behaviors of the target JIT compiler on a set of test programs generated for this purpose.
This paper proposes a novel technique for automatic test program generation for JIT compiler bug localization.
arXiv Detail & Related papers (2023-07-17T22:43:02Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z) - Natural Language to Code Translation with Execution [82.52142893010563]
Execution result--minimum Bayes risk decoding for program selection.
We show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks.
arXiv Detail & Related papers (2022-04-25T06:06:08Z) - Configuring Test Generators using Bug Reports: A Case Study of GCC
Compiler and Csmith [2.1016374925364616]
This paper uses the code snippets in the bug reports to guide the test generation.
We evaluate this approach on eight versions of GCC.
We find that our approach provides higher coverage and triggers more miscompilation failures than the state-of-the-art test generation techniques for GCC.
arXiv Detail & Related papers (2020-12-19T11:25:13Z) - Detecting and Understanding Real-World Differential Performance Bugs in
Machine Learning Libraries [2.879036956042183]
We find inputs for which the performance varies widely, despite having the same size.
We compare the performance of not only single inputs, but of classes of inputs, where each class has similar inputs parameterized by their size.
Importantly, we also provide an explanation for why the performance differs in a form that can be readily used to fix a performance bug.
arXiv Detail & Related papers (2020-06-03T00:23:06Z) - Using Large-Scale Anomaly Detection on Code to Improve Kotlin Compiler [4.179323589439977]
We apply anomaly detection to source code and bytecode to facilitate the development of a programming language.
We define anomaly as a code fragment that is different from typical code written in a particular programming language.
arXiv Detail & Related papers (2020-04-03T15:20:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.