Related papers: EvoGPT: Enhancing Test Suite Robustness via LLM-Based Generation and Genetic Optimization

EvoGPT: Enhancing Test Suite Robustness via LLM-Based Generation and Genetic Optimization

URL: http://arxiv.org/abs/2505.12424v1
Date: Sun, 18 May 2025 13:48:53 GMT
Title: EvoGPT: Enhancing Test Suite Robustness via LLM-Based Generation and Genetic Optimization
Authors: Lior Broide, Roni Stern,
Abstract summary: Large Language Models (LLMs) have recently emerged as promising tools for automated unit test generation.<n>We introduce a hybrid framework called EvoGPT that integrates LLM-based test generation with evolutionary search techniques to create diverse, fault-revealing unit tests.
Score: 11.050047263054985
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have recently emerged as promising tools for automated unit test generation. We introduce a hybrid framework called EvoGPT that integrates LLM-based test generation with evolutionary search techniques to create diverse, fault-revealing unit tests. Unit tests are initially generated with diverse temperature sampling to maximize behavioral and test suite diversity, followed by a generation-repair loop and coverage-guided assertion enhancement. The resulting test suites are evolved using genetic algorithms, guided by a fitness function prioritizing mutation score over traditional coverage metrics. This design emphasizes the primary objective of unit testing-fault detection. Evaluated on multiple open-source Java projects, EvoGPT achieves an average improvement of 10% in both code coverage and mutation score compared to LLMs and traditional search-based software testing baselines. These results demonstrate that combining LLM-driven diversity, targeted repair, and evolutionary optimization produces more effective and resilient test suites.

Related papers

LLAMA: Multi-Feedback Smart Contract Fuzzing Framework with LLM-Guided Seed Generation [56.84049855266145]
We propose a Multi-feedback Smart Contract Fuzzing framework (LLAMA) that integrates evolutionary mutation strategies, and hybrid testing techniques.<n>LLAMA achieves 91% instruction coverage and 90% branch coverage, while detecting 132 out of 148 known vulnerabilities.<n>These results highlight LLAMA's effectiveness, adaptability, and practicality in real-world smart contract security testing scenarios.
arXiv Detail & Related papers (2025-07-16T09:46:58Z)
Taming Polysemanticity in LLMs: Provable Feature Recovery via Sparse Autoencoders [50.52694757593443]
Existing SAE training algorithms often lack rigorous mathematical guarantees and suffer from practical limitations.<n>We first propose a novel statistical framework for the feature recovery problem, which includes a new notion of feature identifiability.<n>We introduce a new SAE training algorithm based on bias adaptation'', a technique that adaptively adjusts neural network bias parameters to ensure appropriate activation sparsity.
arXiv Detail & Related papers (2025-06-16T20:58:05Z)
Hallucination to Consensus: Multi-Agent LLMs for End-to-End Test Generation with Accurate Oracles [2.908307411619051]
Unit testing plays a critical role in ensuring software correctness.<n>Traditional methods rely on search-based or randomized algorithms to generate tests.<n>We propose CANDOR, a novel end-to-end, prompt-based framework for automated JUnit test generation.
arXiv Detail & Related papers (2025-06-03T14:43:05Z)
PRIMG : Efficient LLM-driven Test Generation Using Mutant Prioritization [0.0]
PRIMG (Prioritization and Refinement Integrated Mutation-driven Generation) is a novel framework for incremental and adaptive test case generation for Solidity smart contracts.<n> PRIMG integrates a mutation prioritization module, which employs a machine learning model trained on mutant subsumption graphs to predict the usefulness of surviving mutants.<n>The prioritization module consistently outperformed random mutant selection, enabling the generation of high-impact tests with reduced computational effort.
arXiv Detail & Related papers (2025-05-08T18:30:22Z)
Generate, Discriminate, Evolve: Enhancing Context Faithfulness via Fine-Grained Sentence-Level Self-Evolution [61.80716438091887]
GenDiE (Generate, Discriminate, Evolve) is a novel self-evolving framework that enhances context faithfulness through fine-grained sentence-level optimization.<n>By treating each sentence in a response as an independent optimization unit, GenDiE effectively addresses the limitations of previous approaches.<n>Experiments on ASQA (in-domain LFQA) and ConFiQA datasets demonstrate that GenDiE surpasses various baselines in both faithfulness and correctness.
arXiv Detail & Related papers (2025-03-03T16:08:33Z)
Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks.<n>However, improvement is plateauing due to the exhaustion of readily available high-quality data.<n>We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z)
Scoring Verifiers: Evaluating Synthetic Verification for Code and Reasoning [59.25951947621526]
We propose an approach which can transform existing coding benchmarks into scoring and ranking datasets to evaluate the effectiveness of synthetic verifiers.<n>We release four new benchmarks (HE-R, HE-R+, MBPP-R, and MBPP-R+), and analyzed synthetic verification methods with standard, reasoning-based, and reward-based LLMs.<n>Our experiments show that reasoning can significantly improve test case generation and that scaling the number of test cases enhances the verification accuracy.
arXiv Detail & Related papers (2025-02-19T15:32:11Z)
Adaptive Testing for LLM-Based Applications: A Diversity-based Approach [15.33985438101206]
We show that diversity-based testing techniques, such as Adaptive Random Testing (ART), can be effectively applied to the testing of prompt templates.<n>Our results, obtained using various implementations that explore several string-based distances, confirm that our approach enables the discovery of failures with reduced testing budgets.
arXiv Detail & Related papers (2025-01-23T08:53:12Z)
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models [88.16197692794707]
UniGen is a comprehensive framework designed to produce diverse, accurate, and highly controllable datasets. To augment data diversity, UniGen incorporates an attribute-guided generation module and a group checking feature. Extensive experiments demonstrate the superior quality of data generated by UniGen.
arXiv Detail & Related papers (2024-06-27T07:56:44Z)
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment [88.56809269990625]
We propose a bilevel objective optimistically biased towards potentially high-reward responses to actively explore out-of-distribution regions. Our experimental results demonstrate that when fine-tuned on Zephyr-7B-SFT and Llama-3-8B-Instruct models, Self-Exploring Language Models (SELM) significantly boosts the performance on instruction-following benchmarks.
arXiv Detail & Related papers (2024-05-29T17:59:07Z)
Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM [32.44432906540792]
We present SymPrompt, a code-aware prompting strategy for large language models in test generation. SymPrompt enhances correct test generations by a factor of 5 and bolsters relative coverage by 26% for CodeGen2. Notably, when applied to GPT-4, SymPrompt improves coverage by over 2x compared to baseline prompting strategies.
arXiv Detail & Related papers (2024-01-31T18:21:49Z)
Machine Learning Testing in an ADAS Case Study Using Simulation-Integrated Bio-Inspired Search-Based Testing [7.5828169434922]
Deeper generates failure-revealing test scenarios for testing a deep neural network-based lane-keeping system. In the newly proposed version, we utilize a new set of bio-inspired search algorithms, genetic algorithm (GA), $(mu+lambda)$ and $(mu,lambda)$ evolution strategies (ES), and particle swarm optimization (PSO) Our evaluation shows the newly proposed test generators in Deeper represent a considerable improvement on the previous version.
arXiv Detail & Related papers (2022-03-22T20:27:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.