Evolutionary Generative Fuzzing for Differential Testing of the Kotlin
Compiler
- URL: http://arxiv.org/abs/2401.06653v1
- Date: Fri, 12 Jan 2024 16:01:12 GMT
- Title: Evolutionary Generative Fuzzing for Differential Testing of the Kotlin
Compiler
- Authors: Calin Georgescu, Mitchell Olsthoorn, Pouria Derakhshanfar, Marat
Akhin, Annibale Panichella
- Abstract summary: We investigate the effectiveness of differential testing in finding bugs within the Kotlin compilers developed at JetBrains.
We propose a black-box generative approach that creates input programs for the K1 and K2 compilers.
Our case study shows that the proposed approach effectively detects bugs in K1 and K2; these bugs have been confirmed and (some) fixed by JetBrains developers.
- Score: 14.259471945857431
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compiler correctness is a cornerstone of reliable software development.
However, systematic testing of compilers is infeasible, given the vast space of
possible programs and the complexity of modern programming languages. In this
context, differential testing offers a practical methodology as it addresses
the oracle problem by comparing the output of alternative compilers given the
same set of programs as input. In this paper, we investigate the effectiveness
of differential testing in finding bugs within the Kotlin compilers developed
at JetBrains. We propose a black-box generative approach that creates input
programs for the K1 and K2 compilers. First, we build workable models of Kotlin
semantic (semantic interface) and syntactic (enriched context-free grammar)
language features, which are subsequently exploited to generate random code
snippets. Second, we extend random sampling by introducing two genetic
algorithms (GAs) that aim to generate more diverse input programs. Our case
study shows that the proposed approach effectively detects bugs in K1 and K2;
these bugs have been confirmed and (some) fixed by JetBrains developers. While
we do not observe a significant difference w.r.t. the number of defects
uncovered by the different search algorithms, random search and GAs are
complementary as they find different categories of bugs. Finally, we provide
insights into the relationships between the size, complexity, and fault
detection capability of the generated input programs.
Related papers
- EquiBench: Benchmarking Code Reasoning Capabilities of Large Language Models via Equivalence Checking [54.354203142828084]
We present the task of equivalence checking as a new way to evaluate the code reasoning abilities of large language models.
We introduce EquiBench, a dataset of 2400 program pairs spanning four programming languages and six equivalence categories.
Our evaluation of 17 state-of-the-art LLMs shows that OpenAI o3-mini achieves the highest overall accuracy of 78.0%.
arXiv Detail & Related papers (2025-02-18T02:54:25Z) - Finding Missed Code Size Optimizations in Compilers using LLMs [1.90019787465083]
We develop a novel testing approach which combines large language models with a series of differential testing strategies.
Our approach requires fewer than 150 lines of code to implement.
To date we have reported 24 confirmed bugs in production compilers.
arXiv Detail & Related papers (2024-12-31T21:47:46Z) - Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification.
In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction.
Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z) - Galapagos: Automated N-Version Programming with LLMs [10.573037638807024]
We propose the automated generation of program variants using large language models.
We design, develop and evaluate Gal'apagos: a tool for generating program variants.
We evaluate Gal'apagos by creating N-Version components of real-world C code.
arXiv Detail & Related papers (2024-08-18T16:44:01Z) - AdaCCD: Adaptive Semantic Contrasts Discovery Based Cross Lingual
Adaptation for Code Clone Detection [69.79627042058048]
AdaCCD is a novel cross-lingual adaptation method that can detect cloned codes in a new language without annotations in that language.
We evaluate the cross-lingual adaptation results of AdaCCD by constructing a multilingual code clone detection benchmark consisting of 5 programming languages.
arXiv Detail & Related papers (2023-11-13T12:20:48Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z) - Directed Test Program Generation for JIT Compiler Bug Localization [3.626013617212667]
Bug localization techniques for Just-in-Time (JIT) compilers are based on analyzing the execution behaviors of the target JIT compiler on a set of test programs generated for this purpose.
This paper proposes a novel technique for automatic test program generation for JIT compiler bug localization.
arXiv Detail & Related papers (2023-07-17T22:43:02Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z) - Configuring Test Generators using Bug Reports: A Case Study of GCC
Compiler and Csmith [2.1016374925364616]
This paper uses the code snippets in the bug reports to guide the test generation.
We evaluate this approach on eight versions of GCC.
We find that our approach provides higher coverage and triggers more miscompilation failures than the state-of-the-art test generation techniques for GCC.
arXiv Detail & Related papers (2020-12-19T11:25:13Z) - Using Large-Scale Anomaly Detection on Code to Improve Kotlin Compiler [4.179323589439977]
We apply anomaly detection to source code and bytecode to facilitate the development of a programming language.
We define anomaly as a code fragment that is different from typical code written in a particular programming language.
arXiv Detail & Related papers (2020-04-03T15:20:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.