LLM for Test Script Generation and Migration: Challenges, Capabilities,
and Opportunities
- URL: http://arxiv.org/abs/2309.13574v1
- Date: Sun, 24 Sep 2023 07:58:57 GMT
- Title: LLM for Test Script Generation and Migration: Challenges, Capabilities,
and Opportunities
- Authors: Shengcheng Yu, Chunrong Fang, Yuchen Ling, Chentian Wu, Zhenyu Chen
- Abstract summary: Test script generation is a vital component of software testing, enabling efficient and reliable automation of repetitive test tasks.
Existing generation approaches often encounter limitations, such as difficulties in accurately capturing and reproducing test scripts across diverse devices, platforms, and applications.
This paper investigates the application of large language models (LLM) in the domain of mobile application test script generation.
- Score: 8.504639288314063
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper investigates the application of large language models (LLM) in the
domain of mobile application test script generation. Test script generation is
a vital component of software testing, enabling efficient and reliable
automation of repetitive test tasks. However, existing generation approaches
often encounter limitations, such as difficulties in accurately capturing and
reproducing test scripts across diverse devices, platforms, and applications.
These challenges arise due to differences in screen sizes, input modalities,
platform behaviors, API inconsistencies, and application architectures.
Overcoming these limitations is crucial for achieving robust and comprehensive
test automation.
By leveraging the capabilities of LLMs, we aim to address these challenges
and explore its potential as a versatile tool for test automation. We
investigate how well LLMs can adapt to diverse devices and systems while
accurately capturing and generating test scripts. Additionally, we evaluate its
cross-platform generation capabilities by assessing its ability to handle
operating system variations and platform-specific behaviors. Furthermore, we
explore the application of LLMs in cross-app migration, where it generates test
scripts across different applications and software environments based on
existing scripts.
Throughout the investigation, we analyze its adaptability to various user
interfaces, app architectures, and interaction patterns, ensuring accurate
script generation and compatibility. The findings of this research contribute
to the understanding of LLMs' capabilities in test automation. Ultimately, this
research aims to enhance software testing practices, empowering app developers
to achieve higher levels of software quality and development efficiency.
Related papers
- PentestAgent: Incorporating LLM Agents to Automated Penetration Testing [6.815381197173165]
Manual penetration testing is time-consuming and expensive.
Recent advancements in large language models (LLMs) offer new opportunities for enhancing penetration testing.
We propose PentestAgent, a novel LLM-based automated penetration testing framework.
arXiv Detail & Related papers (2024-11-07T21:10:39Z) - AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs.
Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z) - APITestGenie: Automated API Test Generation through Generative AI [2.0716352593701277]
APITestGenie generates executable API test scripts from business requirements and API specifications.
In experiments with 10 real-world APIs, the tool generated valid test scripts 57% of the time.
Human intervention is recommended to validate or refine generated scripts before integration into CI/CD pipelines.
arXiv Detail & Related papers (2024-09-05T18:02:41Z) - Test Oracle Automation in the era of LLMs [52.69509240442899]
Large Language Models (LLMs) have demonstrated remarkable proficiency in tackling diverse software testing tasks.
This paper aims to enable discussions on the potential of using LLMs for test oracle automation, along with the challenges that may emerge during the generation of various types of oracles.
arXiv Detail & Related papers (2024-05-21T13:19:10Z) - DevBench: A Comprehensive Benchmark for Software Development [72.24266814625685]
DevBench is a benchmark that evaluates large language models (LLMs) across various stages of the software development lifecycle.
Empirical studies show that current LLMs, including GPT-4-Turbo, fail to solve the challenges presented within DevBench.
Our findings offer actionable insights for the future development of LLMs toward real-world programming applications.
arXiv Detail & Related papers (2024-03-13T15:13:44Z) - PPTC-R benchmark: Towards Evaluating the Robustness of Large Language
Models for PowerPoint Task Completion [96.47420221442397]
We construct adversarial user instructions by attacking user instructions at sentence, semantic, and multi-language levels.
We test 3 closed-source and 4 open-source LLMs using a benchmark that incorporates robustness settings.
We find that GPT-4 exhibits the highest performance and strong robustness in our benchmark.
arXiv Detail & Related papers (2024-03-06T15:33:32Z) - Are We Testing or Being Tested? Exploring the Practical Applications of
Large Language Models in Software Testing [0.0]
A Large Language Model (LLM) represents a cutting-edge artificial intelligence model that generates coherent content.
LLM can play a pivotal role in software development, including software testing.
This study explores the practical application of LLMs in software testing within an industrial setting.
arXiv Detail & Related papers (2023-12-08T06:30:37Z) - TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation.
Specifically, task decomposition, tool selection, and parameter prediction are assessed.
Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - Software Testing with Large Language Models: Survey, Landscape, and
Vision [32.34617250991638]
Pre-trained large language models (LLMs) have emerged as a breakthrough technology in natural language processing and artificial intelligence.
This paper provides a comprehensive review of the utilization of LLMs in software testing.
arXiv Detail & Related papers (2023-07-14T08:26:12Z) - Towards Autonomous Testing Agents via Conversational Large Language
Models [18.302956037305112]
Large language models (LLMs) can be used as automated testing assistants.
We present a taxonomy of LLM-based testing agents based on their level of autonomy.
arXiv Detail & Related papers (2023-06-08T12:22:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.