Test Oracle Automation in the era of LLMs
- URL: http://arxiv.org/abs/2405.12766v1
- Date: Tue, 21 May 2024 13:19:10 GMT
- Title: Test Oracle Automation in the era of LLMs
- Authors: Facundo Molina, Alessandra Gorla,
- Abstract summary: Large Language Models (LLMs) have demonstrated remarkable proficiency in tackling diverse software testing tasks.
This paper aims to enable discussions on the potential of using LLMs for test oracle automation, along with the challenges that may emerge during the generation of various types of oracles.
- Score: 52.69509240442899
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The effectiveness of a test suite in detecting faults highly depends on the correctness and completeness of its test oracles. Large Language Models (LLMs) have already demonstrated remarkable proficiency in tackling diverse software testing tasks, such as automated test generation and program repair. This paper aims to enable discussions on the potential of using LLMs for test oracle automation, along with the challenges that may emerge during the generation of various types of oracles. Additionally, our aim is to initiate discussions on the primary threats that SE researchers must consider when employing LLMs for oracle automation, encompassing concerns regarding oracle deficiencies and data leakages.
Related papers
- Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation [11.056044348209483]
Unit testing, crucial for identifying bugs in code modules like classes and methods, is often neglected by developers due to time constraints.
Large Language Models (LLMs), like GPT and Mistral, show promise in software engineering, including in test generation.
arXiv Detail & Related papers (2024-06-28T20:38:41Z) - TOGLL: Correct and Strong Test Oracle Generation with LLMs [0.8057006406834466]
Test oracles play a crucial role in software testing, enabling effective bug detection.
Despite initial promise, neural-based methods for automated test oracle generation often result in a large number of false positives.
We present the first comprehensive study to investigate the capabilities of LLMs in generating correct, diverse, and strong test oracles.
arXiv Detail & Related papers (2024-05-06T18:37:35Z) - LangBiTe: A Platform for Testing Bias in Large Language Models [1.9744907811058787]
Large Language Models (LLMs) are trained on a vast amount of data scrapped from forums, websites, social media and other internet sources.
LangBiTe enables development teams to tailor their test scenarios, and automatically generate and execute the test cases according to a set of user-defined ethical requirements.
LangBite provides users with the bias evaluation of LLMs, and end-to-end traceability between the initial ethical requirements and the insights obtained.
arXiv Detail & Related papers (2024-04-29T10:02:45Z) - Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks.
However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs.
We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z) - Test-Time Self-Adaptive Small Language Models for Question Answering [63.91013329169796]
We show and investigate the capabilities of smaller self-adaptive LMs, only with unlabeled test data.
Our proposed self-adaption strategy demonstrates significant performance improvements on benchmark QA datasets.
arXiv Detail & Related papers (2023-10-20T06:49:32Z) - ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases.
We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets.
Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z) - LLM for Test Script Generation and Migration: Challenges, Capabilities,
and Opportunities [8.504639288314063]
Test script generation is a vital component of software testing, enabling efficient and reliable automation of repetitive test tasks.
Existing generation approaches often encounter limitations, such as difficulties in accurately capturing and reproducing test scripts across diverse devices, platforms, and applications.
This paper investigates the application of large language models (LLM) in the domain of mobile application test script generation.
arXiv Detail & Related papers (2023-09-24T07:58:57Z) - Automatically Correcting Large Language Models: Surveying the landscape
of diverse self-correction strategies [104.32199881187607]
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks.
A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output.
This paper presents a comprehensive review of this emerging class of techniques.
arXiv Detail & Related papers (2023-08-06T18:38:52Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z) - A Review on Oracle Issues in Machine Learning [0.0]
oracle is the data, and the data is not always a correct representation of the problem that machine learning tries to model.
We present a survey of the oracle issues found in machine learning and state-of-the-art solutions for dealing with these issues.
arXiv Detail & Related papers (2021-05-04T10:41:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.