LLMorpheus: Mutation Testing using Large Language Models
- URL: http://arxiv.org/abs/2404.09952v1
- Date: Mon, 15 Apr 2024 17:25:14 GMT
- Title: LLMorpheus: Mutation Testing using Large Language Models
- Authors: Frank Tip, Jonathan Bell, Max Schäfer,
- Abstract summary: This paper presents a technique where a Large Language Model (LLM) is prompted to suggest mutations by asking it what placeholders that have been inserted in source code could be replaced with.
We find LLMorpheus to be capable of producing mutants that resemble existing bugs that cannot be produced by StrykerJS, a state-of-the-art mutation testing tool.
- Score: 7.312170216336085
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In mutation testing, the quality of a test suite is evaluated by introducing faults into a program and determining whether the program's tests detect them. Most existing approaches for mutation testing involve the application of a fixed set of mutation operators, e.g., replacing a "+" with a "-" or removing a function's body. However, certain types of real-world bugs cannot easily be simulated by such approaches, limiting their effectiveness. This paper presents a technique where a Large Language Model (LLM) is prompted to suggest mutations by asking it what placeholders that have been inserted in source code could be replaced with. The technique is implemented in LLMorpheus, a mutation testing tool for JavaScript, and evaluated on 13 subject packages, considering several variations on the prompting strategy, and using several LLMs. We find LLMorpheus to be capable of producing mutants that resemble existing bugs that cannot be produced by StrykerJS, a state-of-the-art mutation testing tool. Moreover, we report on the running time, cost, and number of mutants produced by LLMorpheus, demonstrating its practicality.
Related papers
- Fine-Tuning LLMs for Code Mutation: A New Era of Cyber Threats [0.9208007322096533]
This paper explores the application of Large Language Models in the context of code mutation.
Traditionally, code mutation has been employed to increase software robustness in mission-critical applications.
We propose a novel definition of code mutation training tailored for pre-trained LLM-based code synthesizers.
arXiv Detail & Related papers (2024-10-29T17:43:06Z) - An Exploratory Study on Using Large Language Models for Mutation Testing [32.91472707292504]
Large Language Models (LLMs) have shown great potential in code-related tasks but their utility in mutation testing remains unexplored.
This paper investigates the performance of LLMs in generating effective mutations to their usability, fault detection potential, and relationship with real bugs.
We find that compared to existing approaches, LLMs generate more diverse mutations that are behaviorally closer to real bugs.
arXiv Detail & Related papers (2024-06-14T08:49:41Z) - MMT: Mutation Testing of Java Bytecode with Model Transformation -- An Illustrative Demonstration [0.11470070927586014]
mutation testing is an approach to check the robustness of test suites.
We propose a model-driven approach where mutations of Java bytecode can be flexibly defined by model transformation.
The corresponding tool called MMT has been extended with advanced mutation operators for modifying object-oriented structures.
arXiv Detail & Related papers (2024-04-22T11:33:21Z) - An Empirical Evaluation of Manually Created Equivalent Mutants [54.02049952279685]
Less than 10 % of manually created mutants are equivalent.
Surprisingly, our findings indicate that a significant portion of developers struggle to accurately identify equivalent mutants.
arXiv Detail & Related papers (2024-04-14T13:04:10Z) - DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs)
It covers four major bug categories and 18 minor types in C++, Java, and Python.
We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z) - Contextual Predictive Mutation Testing [17.832774161583036]
We introduce MutationBERT, an approach for predictive mutation testing that simultaneously encodes the source method mutation and test method.
Thanks to its higher precision, MutationBERT saves 33% of the time spent by a prior approach on checking/verifying live mutants.
We validate our input representation, and aggregation approaches for lifting predictions from the test matrix level to the test suite level, finding similar improvements in performance.
arXiv Detail & Related papers (2023-09-05T17:00:15Z) - FacTool: Factuality Detection in Generative AI -- A Tool Augmented
Framework for Multi-Task and Multi-Domain Scenarios [87.12753459582116]
A wider range of tasks now face an increasing risk of containing factual errors when handled by generative models.
We propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models.
arXiv Detail & Related papers (2023-07-25T14:20:51Z) - A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems.
static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models.
We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z) - DeepMutants: Training neural bug detectors with contextual mutations [0.799536002595393]
Learning-based bug detectors promise to find bugs in large code bases by exploiting natural hints.
Still, existing techniques tend to underperform when presented with realistic bugs.
We propose a novel contextual mutation operator which dynamically injects natural and more realistic faults into code.
arXiv Detail & Related papers (2021-07-14T12:45:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.