A Software Engineering Perspective on Testing Large Language Models: Research, Practice, Tools and Benchmarks
- URL: http://arxiv.org/abs/2406.08216v1
- Date: Wed, 12 Jun 2024 13:45:45 GMT
- Title: A Software Engineering Perspective on Testing Large Language Models: Research, Practice, Tools and Benchmarks
- Authors: Sinclair Hudson, Sophia Jit, Boyue Caroline Hu, Marsha Chechik,
- Abstract summary: Large Language Models (LLMs) are rapidly becoming ubiquitous both as stand-alone tools and as components of current and future software systems.
To enable usage of LLMs in the high-stake or safety-critical systems of 2030, they need to undergo rigorous testing.
- Score: 2.8061460833143346
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are rapidly becoming ubiquitous both as stand-alone tools and as components of current and future software systems. To enable usage of LLMs in the high-stake or safety-critical systems of 2030, they need to undergo rigorous testing. Software Engineering (SE) research on testing Machine Learning (ML) components and ML-based systems has systematically explored many topics such as test input generation and robustness. We believe knowledge about tools, benchmarks, research and practitioner views related to LLM testing needs to be similarly organized. To this end, we present a taxonomy of LLM testing topics and conduct preliminary studies of state of the art and practice approaches to research, open-source tools and benchmarks for LLM testing, mapping results onto this taxonomy. Our goal is to identify gaps requiring more research and engineering effort and inspire a clearer communication between LLM practitioners and the SE research community.
Related papers
- A Survey of Small Language Models [104.80308007044634]
Small Language Models (SLMs) have become increasingly important due to their efficiency and performance to perform various language tasks with minimal computational resources.
We present a comprehensive survey on SLMs, focusing on their architectures, training techniques, and model compression techniques.
arXiv Detail & Related papers (2024-10-25T23:52:28Z) - Studying and Benchmarking Large Language Models For Log Level Suggestion [49.176736212364496]
Large Language Models (LLMs) have become a focal point of research across various domains.
This paper investigates the impact of characteristics and learning paradigms on the performance of 12 open-source LLMs in log level suggestion.
arXiv Detail & Related papers (2024-10-11T03:52:17Z) - A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks [74.52259252807191]
Multimodal Large Language Models (MLLMs) address the complexities of real-world applications far beyond the capabilities of single-modality systems.
This paper systematically sorts out the applications of MLLM in multimodal tasks such as natural language, vision, and audio.
arXiv Detail & Related papers (2024-08-02T15:14:53Z) - Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation [11.056044348209483]
Unit testing, crucial for identifying bugs in code modules like classes and methods, is often neglected by developers due to time constraints.
Large Language Models (LLMs), like GPT and Mistral, show promise in software engineering, including in test generation.
arXiv Detail & Related papers (2024-06-28T20:38:41Z) - On the Evaluation of Large Language Models in Unit Test Generation [16.447000441006814]
Unit testing is an essential activity in software development for verifying the correctness of software components.
The emergence of Large Language Models (LLMs) offers a new direction for automating unit test generation.
arXiv Detail & Related papers (2024-06-26T08:57:03Z) - Tool Learning with Large Language Models: A Survey [60.733557487886635]
Tool learning with large language models (LLMs) has emerged as a promising paradigm for augmenting the capabilities of LLMs to tackle highly complex problems.
Despite growing attention and rapid advancements in this field, the existing literature remains fragmented and lacks systematic organization.
arXiv Detail & Related papers (2024-05-28T08:01:26Z) - LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges.
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model.
This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z) - Large Language Models for Generative Information Extraction: A Survey [89.71273968283616]
Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation.
We present an extensive overview by categorizing these works in terms of various IE subtasks and techniques.
We empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs.
arXiv Detail & Related papers (2023-12-29T14:25:22Z) - Are We Testing or Being Tested? Exploring the Practical Applications of
Large Language Models in Software Testing [0.0]
A Large Language Model (LLM) represents a cutting-edge artificial intelligence model that generates coherent content.
LLM can play a pivotal role in software development, including software testing.
This study explores the practical application of LLMs in software testing within an industrial setting.
arXiv Detail & Related papers (2023-12-08T06:30:37Z) - Towards an Understanding of Large Language Models in Software Engineering Tasks [29.30433406449331]
Large Language Models (LLMs) have drawn widespread attention and research due to their astounding performance in text generation and reasoning tasks.
The evaluation and optimization of LLMs in software engineering tasks, such as code generation, have become a research focus.
This paper comprehensively investigate and collate the research and products combining LLMs with software engineering.
arXiv Detail & Related papers (2023-08-22T12:37:29Z) - Software Testing with Large Language Models: Survey, Landscape, and
Vision [32.34617250991638]
Pre-trained large language models (LLMs) have emerged as a breakthrough technology in natural language processing and artificial intelligence.
This paper provides a comprehensive review of the utilization of LLMs in software testing.
arXiv Detail & Related papers (2023-07-14T08:26:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.