Ever-Improving Test Suite by Leveraging Large Language Models
- URL: http://arxiv.org/abs/2506.11000v1
- Date: Tue, 15 Apr 2025 13:38:25 GMT
- Title: Ever-Improving Test Suite by Leveraging Large Language Models
- Authors: Ketai Qiu,
- Abstract summary: Augmenting test suites with test cases that reflect the actual usage of the software system is extremely important to sustain the quality of long lasting software systems.<n>E-Test is an approach that incrementally augments a test suite with test cases that exercise behaviors that emerge in production and that are not been tested yet.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Augmenting test suites with test cases that reflect the actual usage of the software system is extremely important to sustain the quality of long lasting software systems. In this paper, we propose E-Test, an approach that incrementally augments a test suite with test cases that exercise behaviors that emerge in production and that are not been tested yet. E-Test leverages Large Language Models to identify already-tested, not-yet-tested, and error-prone unit execution scenarios, and augment the test suite accordingly. Our experimental evaluation shows that E-Test outperforms the main state-of-the-art approaches to identify inadequately tested behaviors and optimize test suites.
Related papers
- TestAgent: An Adaptive and Intelligent Expert for Human Assessment [62.060118490577366]
We propose TestAgent, a large language model (LLM)-powered agent designed to enhance adaptive testing through interactive engagement.<n>TestAgent supports personalized question selection, captures test-takers' responses and anomalies, and provides precise outcomes through dynamic, conversational interactions.
arXiv Detail & Related papers (2025-06-03T16:07:54Z) - Automatic High-Level Test Case Generation using Large Language Models [1.8136446064778242]
Primary challenge is not writing test scripts but aligning testing efforts with business requirements.<n>We constructed a use-case dataset to train/fine-tune models for generating high-level test cases.<n>Our proactive approach strengthens requirement-testing alignment and facilitates early test case generation.
arXiv Detail & Related papers (2025-03-23T09:14:41Z) - Adaptive Testing for LLM-Based Applications: A Diversity-based Approach [15.33985438101206]
We show that diversity-based testing techniques, such as Adaptive Random Testing (ART), can be effectively applied to the testing of prompt templates.<n>Our results, obtained using various implementations that explore several string-based distances, confirm that our approach enables the discovery of failures with reduced testing budgets.
arXiv Detail & Related papers (2025-01-23T08:53:12Z) - Context-Aware Testing: A New Paradigm for Model Testing with Large Language Models [49.06068319380296]
We introduce context-aware testing (CAT) which uses context as an inductive bias to guide the search for meaningful model failures.
We instantiate the first CAT system, SMART Testing, which employs large language models to hypothesize relevant and likely failures.
arXiv Detail & Related papers (2024-10-31T15:06:16Z) - A System for Automated Unit Test Generation Using Large Language Models and Assessment of Generated Test Suites [1.4563527353943984]
Large Language Models (LLMs) have been applied to various aspects of software development.
We present AgoneTest: an automated system for generating test suites for Java projects.
arXiv Detail & Related papers (2024-08-14T23:02:16Z) - Automatic benchmarking of large multimodal models via iterative experiment programming [71.78089106671581]
We present APEx, the first framework for automatic benchmarking of LMMs.
Given a research question expressed in natural language, APEx leverages a large language model (LLM) and a library of pre-specified tools to generate a set of experiments for the model at hand.
The report drives the testing procedure: based on the current status of the investigation, APEx chooses which experiments to perform and whether the results are sufficient to draw conclusions.
arXiv Detail & Related papers (2024-06-18T06:43:46Z) - Towards Automatic Generation of Amplified Regression Test Oracles [44.45138073080198]
We propose a test oracle derivation approach to amplify regression test oracles.
The approach monitors the object state during test execution and compares it to the previous version to detect any changes in relation to the SUT's intended behaviour.
arXiv Detail & Related papers (2023-07-28T12:38:44Z) - Validation of massively-parallel adaptive testing using dynamic control
matching [0.0]
Modern businesses often run many A/B/n tests at the same time and in parallel, and package many content variations into the same messages.
This paper presents a method for disentangling the causal effects of the various tests under conditions of continuous test adaptation.
arXiv Detail & Related papers (2023-05-02T11:28:12Z) - Beyond Accuracy: Behavioral Testing of NLP models with CheckList [66.42971817954806]
CheckList is a task-agnostic methodology for testing NLP models.
CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation.
In a user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.
arXiv Detail & Related papers (2020-05-08T15:48:31Z) - Noisy Adaptive Group Testing using Bayesian Sequential Experimental
Design [63.48989885374238]
When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually.
Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting.
arXiv Detail & Related papers (2020-04-26T23:41:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.