Large Language Models for Unit Testing: A Systematic Literature Review
- URL: http://arxiv.org/abs/2506.15227v1
- Date: Wed, 18 Jun 2025 08:11:10 GMT
- Title: Large Language Models for Unit Testing: A Systematic Literature Review
- Authors: Quanjun Zhang, Chunrong Fang, Siqi Gu, Ye Shang, Zhenyu Chen, Liang Xiao,
- Abstract summary: This paper presents the first systematic literature review on the application of Large Language Models in unit testing until March 2025.<n>We first categorize existing unit testing tasks that benefit from LLMs, e.g., test generation and oracle generation.<n>We then discuss several critical aspects of integrating LLMs into unit testing research, including model usage, adaptation strategies, and hybrid approaches.
- Score: 13.465045964128196
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unit testing is a fundamental practice in modern software engineering, with the aim of ensuring the correctness, maintainability, and reliability of individual software components. Very recently, with the advances in Large Language Models (LLMs), a rapidly growing body of research has leveraged LLMs to automate various unit testing tasks, demonstrating remarkable performance and significantly reducing manual effort. However, due to ongoing explorations in the LLM-based unit testing field, it is challenging for researchers to understand existing achievements, open challenges, and future opportunities. This paper presents the first systematic literature review on the application of LLMs in unit testing until March 2025. We analyze \numpaper{} relevant papers from the perspectives of both unit testing and LLMs. We first categorize existing unit testing tasks that benefit from LLMs, e.g., test generation and oracle generation. We then discuss several critical aspects of integrating LLMs into unit testing research, including model usage, adaptation strategies, and hybrid approaches. We further summarize key challenges that remain unresolved and outline promising directions to guide future research in this area. Overall, our paper provides a systematic overview of the research landscape to the unit testing community, helping researchers gain a comprehensive understanding of achievements and promote future research. Our artifacts are publicly available at the GitHub repository: https://github.com/iSEngLab/AwesomeLLM4UT.
Related papers
- Testing the Untestable? An Empirical Study on the Testing Process of LLM-Powered Software Systems [0.0]
This study explores how large language models are tested in the context of real-world application development.<n>Case study was conducted using 99 individual reports written by students who built and deployed LLM-powered applications as part of a university course.<n>Results: Testing LLM-powered systems required adaptations to traditional verification methods, blending source-level reasoning with behavior-aware evaluations.
arXiv Detail & Related papers (2025-07-31T22:39:24Z) - A Large-scale Empirical Study on Fine-tuning Large Language Models for Unit Testing [8.22619177301814]
Large Language Models (LLMs) have shown potential in various unit testing tasks.<n>We present a large-scale empirical study on fine-tuning LLMs for unit testing.
arXiv Detail & Related papers (2024-12-21T13:28:11Z) - A System for Automated Unit Test Generation Using Large Language Models and Assessment of Generated Test Suites [1.4563527353943984]
Large Language Models (LLMs) have been applied to various aspects of software development.
We present AgoneTest: an automated system for generating test suites for Java projects.
arXiv Detail & Related papers (2024-08-14T23:02:16Z) - Systematic Task Exploration with LLMs: A Study in Citation Text Generation [63.50597360948099]
Large language models (LLMs) bring unprecedented flexibility in defining and executing complex, creative natural language generation (NLG) tasks.
We propose a three-component research framework that consists of systematic input manipulation, reference data, and output measurement.
We use this framework to explore citation text generation -- a popular scholarly NLP task that lacks consensus on the task definition and evaluation metric.
arXiv Detail & Related papers (2024-07-04T16:41:08Z) - Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation [11.056044348209483]
Unit testing, crucial for identifying bugs in code modules like classes and methods, is often neglected by developers due to time constraints.
Large Language Models (LLMs), like GPT and Mistral, show promise in software engineering, including in test generation.
arXiv Detail & Related papers (2024-06-28T20:38:41Z) - On the Evaluation of Large Language Models in Unit Test Generation [16.447000441006814]
Unit testing is an essential activity in software development for verifying the correctness of software components.
The emergence of Large Language Models (LLMs) offers a new direction for automating unit test generation.
arXiv Detail & Related papers (2024-06-26T08:57:03Z) - A Software Engineering Perspective on Testing Large Language Models: Research, Practice, Tools and Benchmarks [2.8061460833143346]
Large Language Models (LLMs) are rapidly becoming ubiquitous both as stand-alone tools and as components of current and future software systems.
To enable usage of LLMs in the high-stake or safety-critical systems of 2030, they need to undergo rigorous testing.
arXiv Detail & Related papers (2024-06-12T13:45:45Z) - Tool Learning with Large Language Models: A Survey [60.733557487886635]
Tool learning with large language models (LLMs) has emerged as a promising paradigm for augmenting the capabilities of LLMs to tackle highly complex problems.
Despite growing attention and rapid advancements in this field, the existing literature remains fragmented and lacks systematic organization.
arXiv Detail & Related papers (2024-05-28T08:01:26Z) - Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study [72.24266814625685]
We explore the performance of large language models (LLMs) across the entire software development lifecycle with DevEval.<n>DevEval features four programming languages, multiple domains, high-quality data collection, and carefully designed and verified metrics for each task.<n> Empirical studies show that current LLMs, including GPT-4, fail to solve the challenges presented within DevEval.
arXiv Detail & Related papers (2024-03-13T15:13:44Z) - LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges.
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model.
This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z) - Large Language Models for Generative Information Extraction: A Survey [89.71273968283616]
Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation.
We present an extensive overview by categorizing these works in terms of various IE subtasks and techniques.
We empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs.
arXiv Detail & Related papers (2023-12-29T14:25:22Z) - A Survey on Large Language Model based Autonomous Agents [105.2509166861984]
Large language models (LLMs) have demonstrated remarkable potential in achieving human-level intelligence.<n>This paper delivers a systematic review of the field of LLM-based autonomous agents from a holistic perspective.<n>We present a comprehensive overview of the diverse applications of LLM-based autonomous agents in the fields of social science, natural science, and engineering.
arXiv Detail & Related papers (2023-08-22T13:30:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.