Are We Testing or Being Tested? Exploring the Practical Applications of
Large Language Models in Software Testing
- URL: http://arxiv.org/abs/2312.04860v1
- Date: Fri, 8 Dec 2023 06:30:37 GMT
- Title: Are We Testing or Being Tested? Exploring the Practical Applications of
Large Language Models in Software Testing
- Authors: Robson Santos, Italo Santos, Cleyton Magalhaes, Ronnie de Souza Santos
- Abstract summary: A Large Language Model (LLM) represents a cutting-edge artificial intelligence model that generates coherent content.
LLM can play a pivotal role in software development, including software testing.
This study explores the practical application of LLMs in software testing within an industrial setting.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A Large Language Model (LLM) represents a cutting-edge artificial
intelligence model that generates coherent content, including grammatically
precise sentences, human-like paragraphs, and syntactically accurate code
snippets. LLMs can play a pivotal role in software development, including
software testing. LLMs go beyond traditional roles such as requirement analysis
and documentation and can support test case generation, making them valuable
tools that significantly enhance testing practices within the field. Hence, we
explore the practical application of LLMs in software testing within an
industrial setting, focusing on their current use by professional testers. In
this context, rather than relying on existing data, we conducted a
cross-sectional survey and collected data within real working contexts,
specifically, engaging with practitioners in industrial settings. We applied
quantitative and qualitative techniques to analyze and synthesize our collected
data. Our findings demonstrate that LLMs effectively enhance testing documents
and significantly assist testing professionals in programming tasks like
debugging and test case automation. LLMs can support individuals engaged in
manual testing who need to code. However, it is crucial to emphasize that, at
this early stage, software testing professionals should use LLMs with caution
while well-defined methods and guidelines are being built for the secure
adoption of these tools.
Related papers
- The Potential of LLMs in Automating Software Testing: From Generation to Reporting [0.0]
Manual testing, while effective, can be time consuming and costly, leading to an increased demand for automated methods.
Recent advancements in Large Language Models (LLMs) have significantly influenced software engineering.
This paper explores an agent-oriented approach to automated software testing, using LLMs to reduce human intervention and enhance testing efficiency.
arXiv Detail & Related papers (2024-12-31T02:06:46Z) - Studying and Benchmarking Large Language Models For Log Level Suggestion [49.176736212364496]
Large Language Models (LLMs) have become a focal point of research across various domains.
This paper investigates the impact of characteristics and learning paradigms on the performance of 12 open-source LLMs in log level suggestion.
arXiv Detail & Related papers (2024-10-11T03:52:17Z) - Learning to Ask: When LLM Agents Meet Unclear Instruction [55.65312637965779]
Large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone.
We evaluate the performance of LLMs tool-use under imperfect instructions, analyze the error patterns, and build a challenging tool-use benchmark called Noisy ToolBench.
We propose a novel framework, Ask-when-Needed (AwN), which prompts LLMs to ask questions to users whenever they encounter obstacles due to unclear instructions.
arXiv Detail & Related papers (2024-08-31T23:06:12Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - A Software Engineering Perspective on Testing Large Language Models: Research, Practice, Tools and Benchmarks [2.8061460833143346]
Large Language Models (LLMs) are rapidly becoming ubiquitous both as stand-alone tools and as components of current and future software systems.
To enable usage of LLMs in the high-stake or safety-critical systems of 2030, they need to undergo rigorous testing.
arXiv Detail & Related papers (2024-06-12T13:45:45Z) - Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study [72.24266814625685]
We explore the performance of large language models (LLMs) across the entire software development lifecycle with DevEval.
DevEval features four programming languages, multiple domains, high-quality data collection, and carefully designed and verified metrics for each task.
Empirical studies show that current LLMs, including GPT-4, fail to solve the challenges presented within DevEval.
arXiv Detail & Related papers (2024-03-13T15:13:44Z) - LM-Polygraph: Uncertainty Estimation for Language Models [71.21409522341482]
Uncertainty estimation (UE) methods are one path to safer, more responsible, and more effective use of large language models (LLMs)
We introduce LM-Polygraph, a framework with implementations of a battery of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python.
It introduces an extendable benchmark for consistent evaluation of UE techniques by researchers, and a demo web application that enriches the standard chat dialog with confidence scores.
arXiv Detail & Related papers (2023-11-13T15:08:59Z) - LLM for Test Script Generation and Migration: Challenges, Capabilities,
and Opportunities [8.504639288314063]
Test script generation is a vital component of software testing, enabling efficient and reliable automation of repetitive test tasks.
Existing generation approaches often encounter limitations, such as difficulties in accurately capturing and reproducing test scripts across diverse devices, platforms, and applications.
This paper investigates the application of large language models (LLM) in the domain of mobile application test script generation.
arXiv Detail & Related papers (2023-09-24T07:58:57Z) - Software Testing with Large Language Models: Survey, Landscape, and
Vision [32.34617250991638]
Pre-trained large language models (LLMs) have emerged as a breakthrough technology in natural language processing and artificial intelligence.
This paper provides a comprehensive review of the utilization of LLMs in software testing.
arXiv Detail & Related papers (2023-07-14T08:26:12Z) - Towards Autonomous Testing Agents via Conversational Large Language
Models [18.302956037305112]
Large language models (LLMs) can be used as automated testing assistants.
We present a taxonomy of LLM-based testing agents based on their level of autonomy.
arXiv Detail & Related papers (2023-06-08T12:22:38Z) - Self-Checker: Plug-and-Play Modules for Fact-Checking with Large Language Models [75.75038268227554]
Self-Checker is a framework comprising a set of plug-and-play modules that facilitate fact-checking.
This framework provides a fast and efficient way to construct fact-checking systems in low-resource environments.
arXiv Detail & Related papers (2023-05-24T01:46:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.