Test Oracle Automation in the era of LLMs
- URL: http://arxiv.org/abs/2405.12766v1
- Date: Tue, 21 May 2024 13:19:10 GMT
- Title: Test Oracle Automation in the era of LLMs
- Authors: Facundo Molina, Alessandra Gorla,
- Abstract summary: Large Language Models (LLMs) have demonstrated remarkable proficiency in tackling diverse software testing tasks.
This paper aims to enable discussions on the potential of using LLMs for test oracle automation, along with the challenges that may emerge during the generation of various types of oracles.
- Score: 52.69509240442899
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The effectiveness of a test suite in detecting faults highly depends on the correctness and completeness of its test oracles. Large Language Models (LLMs) have already demonstrated remarkable proficiency in tackling diverse software testing tasks, such as automated test generation and program repair. This paper aims to enable discussions on the potential of using LLMs for test oracle automation, along with the challenges that may emerge during the generation of various types of oracles. Additionally, our aim is to initiate discussions on the primary threats that SE researchers must consider when employing LLMs for oracle automation, encompassing concerns regarding oracle deficiencies and data leakages.
Related papers
- AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs.
Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z) - Do LLMs generate test oracles that capture the actual or the expected program behaviour? [7.772338538073763]
Large Language Models (LLMs) are trained on an enormous amount of data to generate developer-like code and test cases.
This study includes developer-written and automatically generated test cases and oracles for 24 open-source Java repositories.
LLMs are better at generating test oracles rather than classifying the correct ones, and can generate better test oracles when the code contains meaningful test or variable names.
arXiv Detail & Related papers (2024-10-28T15:37:06Z) - Learning to Ask: When LLMs Meet Unclear Instruction [49.256630152684764]
Large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone.
We evaluate the performance of LLMs tool-use under imperfect instructions, analyze the error patterns, and build a challenging tool-use benchmark called Noisy ToolBench.
We propose a novel framework, Ask-when-Needed (AwN), which prompts LLMs to ask questions to users whenever they encounter obstacles due to unclear instructions.
arXiv Detail & Related papers (2024-08-31T23:06:12Z) - Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation [11.056044348209483]
Unit testing, crucial for identifying bugs in code modules like classes and methods, is often neglected by developers due to time constraints.
Large Language Models (LLMs), like GPT and Mistral, show promise in software engineering, including in test generation.
arXiv Detail & Related papers (2024-06-28T20:38:41Z) - TOGLL: Correct and Strong Test Oracle Generation with LLMs [0.8057006406834466]
Test oracles play a crucial role in software testing, enabling effective bug detection.
Despite initial promise, neural-based methods for automated test oracle generation often result in a large number of false positives.
We present the first comprehensive study to investigate the capabilities of LLMs in generating correct, diverse, and strong test oracles.
arXiv Detail & Related papers (2024-05-06T18:37:35Z) - LangBiTe: A Platform for Testing Bias in Large Language Models [1.9744907811058787]
Large Language Models (LLMs) are trained on a vast amount of data scrapped from forums, websites, social media and other internet sources.
LangBiTe enables development teams to tailor their test scenarios, and automatically generate and execute the test cases according to a set of user-defined ethical requirements.
LangBite provides users with the bias evaluation of LLMs, and end-to-end traceability between the initial ethical requirements and the insights obtained.
arXiv Detail & Related papers (2024-04-29T10:02:45Z) - Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks.
However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs.
We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z) - ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases.
We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets.
Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z) - A Review on Oracle Issues in Machine Learning [0.0]
oracle is the data, and the data is not always a correct representation of the problem that machine learning tries to model.
We present a survey of the oracle issues found in machine learning and state-of-the-art solutions for dealing with these issues.
arXiv Detail & Related papers (2021-05-04T10:41:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.