Related papers: Towards Evaluation Guidelines for Empirical Studies involving LLMs

Towards Evaluation Guidelines for Empirical Studies involving LLMs

URL: http://arxiv.org/abs/2411.07668v2
Date: Thu, 14 Nov 2024 14:28:24 GMT
Title: Towards Evaluation Guidelines for Empirical Studies involving LLMs
Authors: Stefan Wagner, Marvin Muñoz Barón, Davide Falessi, Sebastian Baltes,
Abstract summary: Large language models (LLMs) have changed the software engineering research landscape. This paper contributes the first set of guidelines for such studies. Our goal is to start a discussion in the software engineering research community to reach a common understanding of what our community standards are for high-quality empirical studies involving LLMs.
Score: 6.174354685766166
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the short period since the release of ChatGPT in November 2022, large language models (LLMs) have changed the software engineering research landscape. While there are numerous opportunities to use LLMs for supporting research or software engineering tasks, solid science needs rigorous empirical evaluations. However, so far, there are no specific guidelines for conducting and assessing studies involving LLMs in software engineering research. Our focus is on empirical studies that either use LLMs as part of the research process (e.g., for data annotation) or studies that evaluate existing or new tools that are based on LLMs. This paper contributes the first set of guidelines for such studies. Our goal is to start a discussion in the software engineering research community to reach a common understanding of what our community standards are for high-quality empirical studies involving LLMs.

Related papers

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment [291.03029298928857]
This paper introduces the concept of "full-stack" safety to systematically consider safety issues throughout the entire process of LLM training, deployment, and commercialization. Our research is grounded in an exhaustive review of over 800+ papers, ensuring comprehensive coverage and systematic organization of security issues. Our work identifies promising research directions, including safety in data generation, alignment techniques, model editing, and LLM-based agent systems.
arXiv Detail & Related papers (2025-04-22T05:02:49Z)
LLM4SR: A Survey on Large Language Models for Scientific Research [15.533076347375207]
Large Language Models (LLMs) offer unprecedented support across various stages of the research cycle. This paper presents the first systematic survey dedicated to exploring how LLMs are revolutionizing the scientific research process.
arXiv Detail & Related papers (2025-01-08T06:44:02Z)
What is the Role of Large Language Models in the Evolution of Astronomy Research? [0.0]
ChatGPT and other state-of-the-art large language models (LLMs) are rapidly transforming multiple fields. These models, commonly trained on vast datasets, exhibit human-like text generation capabilities.
arXiv Detail & Related papers (2024-09-30T12:42:25Z)
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future [15.568939568441317]
We investigate the current practice and solutions for large language models (LLMs) and LLM-based agents for software engineering. In particular we summarise six key topics: requirement engineering, code generation, autonomous decision-making, software design, test generation, and software maintenance. We discuss the models and benchmarks used, providing a comprehensive analysis of their applications and effectiveness in software engineering.
arXiv Detail & Related papers (2024-08-05T14:01:15Z)
LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing [106.45895712717612]
Large language models (LLMs) have shown remarkable versatility in various generative tasks. This study focuses on the topic of LLMs assist NLP Researchers. To our knowledge, this is the first work to provide such a comprehensive analysis.
arXiv Detail & Related papers (2024-06-24T01:30:22Z)
Mapping the Increasing Use of LLMs in Scientific Papers [99.67983375899719]
We conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals. Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers.
arXiv Detail & Related papers (2024-04-01T17:45:15Z)
Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z)
LLMs for Science: Usage for Code Generation and Data Analysis [0.07499722271664144]
Large language models (LLMs) have been touted to enable increased productivity in many areas of today's work life. It is still unclear how the potential of LLMs will materialise in research practice.
arXiv Detail & Related papers (2023-11-28T12:29:33Z)
A Survey on Large Language Model based Autonomous Agents [105.2509166861984]
Large language models (LLMs) have demonstrated remarkable potential in achieving human-level intelligence. This paper delivers a systematic review of the field of LLM-based autonomous agents from a holistic perspective. We present a comprehensive overview of the diverse applications of LLM-based autonomous agents in the fields of social science, natural science, and engineering.
arXiv Detail & Related papers (2023-08-22T13:30:37Z)
Towards an Understanding of Large Language Models in Software Engineering Tasks [29.30433406449331]
Large Language Models (LLMs) have drawn widespread attention and research due to their astounding performance in text generation and reasoning tasks. The evaluation and optimization of LLMs in software engineering tasks, such as code generation, have become a research focus. This paper comprehensively investigate and collate the research and products combining LLMs with software engineering.
arXiv Detail & Related papers (2023-08-22T12:37:29Z)
A Comprehensive Overview of Large Language Models [68.22178313875618]
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks. This article provides an overview of the existing literature on a broad range of LLM-related concepts.
arXiv Detail & Related papers (2023-07-12T20:01:52Z)
A Survey on Evaluation of Large Language Models [87.60417393701331]
Large language models (LLMs) are gaining increasing popularity in both academia and industry. This paper focuses on three key dimensions: what to evaluate, where to evaluate, and how to evaluate.
arXiv Detail & Related papers (2023-07-06T16:28:35Z)
A Bibliometric Review of Large Language Models Research from 2017 to 2023 [1.4190701053683017]
Large language models (LLMs) are language models that have demonstrated outstanding performance across a range of natural language processing (NLP) tasks. This paper serves as a roadmap for researchers, practitioners, and policymakers to navigate the current landscape of LLMs research.
arXiv Detail & Related papers (2023-04-03T21:46:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.