Related papers: PRIMG : Efficient LLM-driven Test Generation Using Mutant Prioritization

PRIMG : Efficient LLM-driven Test Generation Using Mutant Prioritization

URL: http://arxiv.org/abs/2505.05584v1
Date: Thu, 08 May 2025 18:30:22 GMT
Title: PRIMG : Efficient LLM-driven Test Generation Using Mutant Prioritization
Authors: Mohamed Salah Bouafif, Mohammad Hamdaqa, Edward Zulkoski,
Abstract summary: PRIMG (Prioritization and Refinement Integrated Mutation-driven Generation) is a novel framework for incremental and adaptive test case generation for Solidity smart contracts.<n> PRIMG integrates a mutation prioritization module, which employs a machine learning model trained on mutant subsumption graphs to predict the usefulness of surviving mutants.<n>The prioritization module consistently outperformed random mutant selection, enabling the generation of high-impact tests with reduced computational effort.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mutation testing is a widely recognized technique for assessing and enhancing the effectiveness of software test suites by introducing deliberate code mutations. However, its application often results in overly large test suites, as developers generate numerous tests to kill specific mutants, increasing computational overhead. This paper introduces PRIMG (Prioritization and Refinement Integrated Mutation-driven Generation), a novel framework for incremental and adaptive test case generation for Solidity smart contracts. PRIMG integrates two core components: a mutation prioritization module, which employs a machine learning model trained on mutant subsumption graphs to predict the usefulness of surviving mutants, and a test case generation module, which utilizes Large Language Models (LLMs) to generate and iteratively refine test cases to achieve syntactic and behavioral correctness. We evaluated PRIMG on real-world Solidity projects from Code4Arena to assess its effectiveness in improving mutation scores and generating high-quality test cases. The experimental results demonstrate that PRIMG significantly reduces test suite size while maintaining high mutation coverage. The prioritization module consistently outperformed random mutant selection, enabling the generation of high-impact tests with reduced computational effort. Furthermore, the refining process enhanced the correctness and utility of LLM-generated tests, addressing their inherent limitations in handling edge cases and complex program logic.

Related papers

LLAMA: Multi-Feedback Smart Contract Fuzzing Framework with LLM-Guided Seed Generation [56.84049855266145]
We propose a Multi-feedback Smart Contract Fuzzing framework (LLAMA) that integrates evolutionary mutation strategies, and hybrid testing techniques.<n>LLAMA achieves 91% instruction coverage and 90% branch coverage, while detecting 132 out of 148 known vulnerabilities.<n>These results highlight LLAMA's effectiveness, adaptability, and practicality in real-world smart contract security testing scenarios.
arXiv Detail & Related papers (2025-07-16T09:46:58Z)
EvoGPT: Enhancing Test Suite Robustness via LLM-Based Generation and Genetic Optimization [11.050047263054985]
Large Language Models (LLMs) have recently emerged as promising tools for automated unit test generation.<n>We introduce a hybrid framework called EvoGPT that integrates LLM-based test generation with evolutionary search techniques to create diverse, fault-revealing unit tests.
arXiv Detail & Related papers (2025-05-18T13:48:53Z)
Mutation Testing via Iterative Large Language Model-Driven Scientific Debugging [10.334617290353192]
We evaluate whether Scientific computation can help Large Language Models (LLMs) to generate tests for mutants.<n>LLMs consistently outperform Pynguin in generating tests with better fault detection and coverage.<n> Importantly, we observe that the iterative refinement of test cases is important for achieving high-quality test suites.
arXiv Detail & Related papers (2025-03-11T08:47:13Z)
Scalable Best-of-N Selection for Large Language Models via Self-Certainty [65.31658824274894]
Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models.<n>We propose self-certainty, a novel and efficient metric to estimate response quality without requiring external reward models.<n>Our findings establish self-certainty as a practical and efficient way for improving LLM reasoning capabilities.
arXiv Detail & Related papers (2025-02-25T19:08:07Z)
Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks.<n>However, improvement is plateauing due to the exhaustion of readily available high-quality data.<n>We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z)
Toward Automated Validation of Language Model Synthesized Test Cases using Semantic Entropy [0.5057850174013127]
Modern Large Language Model (LLM)-based programming agents often rely on test execution feedback to refine their generated code.<n>This paper introduces VALTEST, a novel framework that leverages semantic entropy to automatically validate test cases generated by LLMs.<n>Experiments show that VALTEST boosts test validity by up to 29% and improves code generation performance, as evidenced by significant increases in pass@1 scores.
arXiv Detail & Related papers (2024-11-13T00:07:32Z)
TestART: Improving LLM-based Unit Testing via Co-evolution of Automated Generation and Repair Iteration [7.509927117191286]
Large language models (LLMs) have demonstrated remarkable capabilities in generating unit test cases.<n>We propose TestART, a novel unit test generation method.<n>TestART improves LLM-based unit testing via co-evolution of automated generation and repair iteration.
arXiv Detail & Related papers (2024-08-06T10:52:41Z)
Self-Evaluation Improves Selective Generation in Large Language Models [54.003992911447696]
We reformulate open-ended generation tasks into token-level prediction tasks. We instruct an LLM to self-evaluate its answers. We benchmark a range of scoring methods based on self-evaluation.
arXiv Detail & Related papers (2023-12-14T19:09:22Z)
Precise Error Rates for Computationally Efficient Testing [67.30044609837749]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.<n>An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z)
Contextual Predictive Mutation Testing [17.832774161583036]
We introduce MutationBERT, an approach for predictive mutation testing that simultaneously encodes the source method mutation and test method. Thanks to its higher precision, MutationBERT saves 33% of the time spent by a prior approach on checking/verifying live mutants. We validate our input representation, and aggregation approaches for lifting predictions from the test matrix level to the test suite level, finding similar improvements in performance.
arXiv Detail & Related papers (2023-09-05T17:00:15Z)
Effective Test Generation Using Pre-trained Large Language Models and Mutation Testing [13.743062498008555]
We introduce MuTAP for improving the effectiveness of test cases generated by Large Language Models (LLMs) in terms of revealing bugs. MuTAP is capable of generating effective test cases in the absence of natural language descriptions of the Program Under Test (PUTs) Our results show that our proposed method is able to detect up to 28% more faulty human-written code snippets.
arXiv Detail & Related papers (2023-08-31T08:48:31Z)
Noisy Adaptive Group Testing using Bayesian Sequential Experimental Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually. Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.