Related papers: Reflecting on Empirical and Sustainability Aspects of Software Engineering Research in the Era of Large Language Models

Reflecting on Empirical and Sustainability Aspects of Software Engineering Research in the Era of Large Language Models

URL: http://arxiv.org/abs/2510.26538v1
Date: Thu, 30 Oct 2025 14:27:51 GMT
Title: Reflecting on Empirical and Sustainability Aspects of Software Engineering Research in the Era of Large Language Models
Authors: David Williams, Max Hort, Maria Kechagia, Aldeida Aleti, Justyna Petke, Federica Sarro,
Abstract summary: Software Engineering (SE) research involving the use of Large Language Models (LLMs) has introduced several new challenges related to rigour in benchmarking, contamination, replicability, and sustainability.<n>Our results provide a structured overview of current LLM-based SE research at ICSE, highlighting both encouraging practices and persistent shortcomings.<n>We conclude with recommendations to strengthen benchmarking rigour, improve replicability, and address the financial and environmental costs of LLM-based SE.
Score: 13.459892241342589
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Software Engineering (SE) research involving the use of Large Language Models (LLMs) has introduced several new challenges related to rigour in benchmarking, contamination, replicability, and sustainability. In this paper, we invite the research community to reflect on how these challenges are addressed in SE. Our results provide a structured overview of current LLM-based SE research at ICSE, highlighting both encouraging practices and persistent shortcomings. We conclude with recommendations to strengthen benchmarking rigour, improve replicability, and address the financial and environmental costs of LLM-based SE.

Related papers

Designing Empirical Studies on LLM-Based Code Generation: Towards a Reference Framework [0.3568466510804538]
We propose a theoretical framework for designing and reporting empirical studies on large language models (LLMs)-based code generation.<n>The framework is grounded in both our prior experience conducting such experiments and a comparative analysis of key similarities and differences among recent studies.<n>It organizes evaluation around core components such as problem sources, quality attributes, and metrics, supporting structured and systematic experimentation.
arXiv Detail & Related papers (2025-10-04T16:15:54Z)
Software Engineering for Large Language Models: Research Status, Challenges and the Road Ahead [4.835306415626808]
Large language models (LLMs) have redefined artificial intelligence (AI)<n>LLMs development faces increasingly complex challenges throughout its lifecycle.<n>No existing research systematically explores these challenges and solutions from the perspective of software engineering (SE) approaches.
arXiv Detail & Related papers (2025-06-30T12:09:29Z)
Robustness in Large Language Models: A Survey of Mitigation Strategies and Evaluation Metrics [0.7481505949203433]
Large Language Models (LLMs) have emerged as a promising cornerstone for the development of natural language processing (NLP) and artificial intelligence (AI)<n>This survey provides a comprehensive overview of current studies in this area.
arXiv Detail & Related papers (2025-05-24T11:50:52Z)
Disambiguation in Conversational Question Answering in the Era of LLMs and Agents: A Survey [54.90240495777929]
Ambiguity remains a fundamental challenge in Natural Language Processing (NLP)<n>With the advent of Large Language Models (LLMs), addressing ambiguity has become even more critical due to their expanded capabilities and applications.<n>This paper explores the definition, forms, and implications of ambiguity for language driven systems.
arXiv Detail & Related papers (2025-05-18T20:53:41Z)
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond [88.5807076505261]
Large Reasoning Models (LRMs) have demonstrated strong performance gains by scaling up the length of Chain-of-Thought (CoT) reasoning during inference.<n>A growing concern lies in their tendency to produce excessively long reasoning traces.<n>This inefficiency introduces significant challenges for training, inference, and real-world deployment.
arXiv Detail & Related papers (2025-03-27T15:36:30Z)
Bridging Language Models and Financial Analysis [49.361943182322385]
The rapid advancements in Large Language Models (LLMs) have unlocked transformative possibilities in natural language processing.<n>Financial data is often embedded in intricate relationships across textual content, numerical tables, and visual charts.<n>Despite the fast pace of innovation in LLM research, there remains a significant gap in their practical adoption within the finance industry.
arXiv Detail & Related papers (2025-03-14T01:35:20Z)
A Survey on Enhancing Causal Reasoning Ability of Large Language Models [15.602788561902038]
Large language models (LLMs) have recently shown remarkable performance in language tasks and beyond.<n>LLMs still face challenges in handling tasks that require robust causal reasoning ability, such as health-care and economic analysis.<n>This paper systematically reviews literature on how to strengthen LLMs' causal reasoning ability.
arXiv Detail & Related papers (2025-03-12T12:20:31Z)
A Survey on Post-training of Large Language Models [185.51013463503946]
Large Language Models (LLMs) have fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration.<n>These challenges necessitate advanced post-training language models (PoLMs) to address shortcomings, such as restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance.<n>This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms: Fine-tuning, which enhances task-specific accuracy; Alignment, which ensures ethical coherence and alignment with human preferences; Reasoning, which advances multi-step inference despite challenges in reward design; Integration and Adaptation, which
arXiv Detail & Related papers (2025-03-08T05:41:42Z)
Exploring and Evaluating Multimodal Knowledge Reasoning Consistency of Multimodal Large Language Models [52.569132872560814]
multimodal large language models (MLLMs) have achieved significant breakthroughs, enhancing understanding across text and vision.<n>However, current MLLMs still face challenges in effectively integrating knowledge across these modalities during multimodal knowledge reasoning.<n>We analyze and compare the extent of consistency degradation in multimodal knowledge reasoning within MLLMs.
arXiv Detail & Related papers (2025-03-03T09:01:51Z)
A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law [65.87885628115946]
Large language models (LLMs) are revolutionizing the landscapes of finance, healthcare, and law. We highlight the instrumental role of LLMs in enhancing diagnostic and treatment methodologies in healthcare, innovating financial analytics, and refining legal interpretation and compliance strategies. We critically examine the ethics for LLM applications in these fields, pointing out the existing ethical concerns and the need for transparent, fair, and robust AI systems.
arXiv Detail & Related papers (2024-05-02T22:43:02Z)
Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence [5.147767778946168]
We critically assess 23 state-of-the-art Large Language Models (LLMs) benchmarks. Our research uncovered significant limitations, including biases, difficulties in measuring genuine reasoning, adaptability, implementation inconsistencies, prompt engineering complexity, diversity, and the overlooking of cultural and ideological norms.
arXiv Detail & Related papers (2024-02-15T11:08:10Z)
K-Level Reasoning: Establishing Higher Order Beliefs in Large Language Models for Strategic Reasoning [76.3114831562989]
It requires Large Language Model (LLM) agents to adapt their strategies dynamically in multi-agent environments. We propose a novel framework: "K-Level Reasoning with Large Language Models (K-R)"
arXiv Detail & Related papers (2024-02-02T16:07:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.