Related papers: AutoEmpirical: LLM-Based Automated Research for Empirical Software Fault Analysis

AutoEmpirical: LLM-Based Automated Research for Empirical Software Fault Analysis

URL: http://arxiv.org/abs/2510.04997v1
Date: Mon, 06 Oct 2025 16:37:18 GMT
Title: AutoEmpirical: LLM-Based Automated Research for Empirical Software Fault Analysis
Authors: Jiongchi Yu, Weipeng Jiang, Xiaoyu Zhang, Qiang Hu, Xiaofei Xie, Chao Shen,
Abstract summary: This paper decomposes the process of empirical software fault study into three key phases: research objective definition, data preparation, and fault analysis.<n>We show that Large Language Models (LLMs) can substantially improve efficiency in fault analysis, with an average processing time of about two hours.
Score: 29.429275242269664
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding software faults is essential for empirical research in software development and maintenance. However, traditional fault analysis, while valuable, typically involves multiple expert-driven steps such as collecting potential faults, filtering, and manual investigation. These processes are both labor-intensive and time-consuming, creating bottlenecks that hinder large-scale fault studies in complex yet critical software systems and slow the pace of iterative empirical research. In this paper, we decompose the process of empirical software fault study into three key phases: (1) research objective definition, (2) data preparation, and (3) fault analysis, and we conduct an initial exploration study of applying Large Language Models (LLMs) for fault analysis of open-source software. Specifically, we perform the evaluation on 3,829 software faults drawn from a high-quality empirical study. Our results show that LLMs can substantially improve efficiency in fault analysis, with an average processing time of about two hours, compared to the weeks of manual effort typically required. We conclude by outlining a detailed research plan that highlights both the potential of LLMs for advancing empirical fault studies and the open challenges that required be addressed to achieve fully automated, end-to-end software fault analysis.

Related papers

Toward Automated and Trustworthy Scientific Analysis and Visualization with LLM-Generated Code [6.068120728706316]
Large language models (LLMs) offer a promising solution by generating code from natural language descriptions.<n>We construct a benchmark suite of domain-inspired prompts that reflect real-world research tasks.<n>Our findings show that, without human intervention, the reliability of LLM-generated code is limited.
arXiv Detail & Related papers (2025-11-26T21:27:03Z)
An Empirical Study of Reasoning Steps in Thinking Code LLMs [8.653365851909745]
Thinking Large Language Models generate explicit intermediate reasoning traces before final answers.<n>This study examines the reasoning process and quality of thinking LLMs for code generation.
arXiv Detail & Related papers (2025-11-08T06:18:48Z)
LogReasoner: Empowering LLMs with Expert-like Coarse-to-Fine Reasoning for Automated Log Analysis [66.79746720402811]
General-purpose large language models (LLMs) struggle to formulate structured reasoning that align with expert cognition and deliver precise details of reasoning steps.<n>We propose LogReasoner, a coarse-grained enhancement framework designed to enable LLMs to reason log analysis tasks like experts.<n>We evaluate LogReasoner on four distinct log analysis tasks using open-source LLMs such as Qwen-2.5 and Llama-3.
arXiv Detail & Related papers (2025-09-25T06:26:49Z)
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study [55.09905978813599]
Large Language Models (LLMs) hold promise in automating data analysis tasks.<n>Yet open-source models face significant limitations in these kinds of reasoning-intensive scenarios.<n>In this work, we investigate strategies to enhance the data analysis capabilities of open-source LLMs.
arXiv Detail & Related papers (2025-06-24T17:04:23Z)
KnowCoder-V2: Deep Knowledge Analysis [64.63893361811968]
We propose a textbfKnowledgeable textbfDeep textbfResearch (textbfKDR) framework that empowers deep research with deep knowledge analysis capability.<n>It introduces an independent knowledge organization phase to preprocess large-scale, domain-relevant data into systematic knowledge offline.<n>It then extends deep research with an additional kind of reasoning steps that perform complex knowledge computation in an online manner.
arXiv Detail & Related papers (2025-06-07T18:01:25Z)
Evaluating Large Language Models for Real-World Engineering Tasks [75.97299249823972]
This paper introduces a curated database comprising over 100 questions derived from authentic, production-oriented engineering scenarios.<n>Using this dataset, we evaluate four state-of-the-art Large Language Models (LLMs)<n>Our results show that LLMs demonstrate strengths in basic temporal and structural reasoning but struggle significantly with abstract reasoning, formal modeling, and context-sensitive engineering logic.
arXiv Detail & Related papers (2025-05-12T14:05:23Z)
Flowco: Rethinking Data Analysis in the Age of LLMs [2.1874189959020427]
Large language models (LLMs) are now capable of generating such code for simple, routine analyses.<n>LLMs promise to democratize data science by enabling those with limited programming expertise to conduct data analyses.<n>Analysts in many real-world settings must often exercise fine-grained control over specific analysis steps.<n>This paper introduces Flowco, a new mixed-initiative system to address these challenges.
arXiv Detail & Related papers (2025-04-18T19:01:27Z)
Large Language Models (LLMs) for Source Code Analysis: applications, models and datasets [3.8740749765622167]
Large language models (LLMs) and transformer-based architectures are increasingly utilized for source code analysis.<n>This paper explores the role of LLMs for different code analysis tasks, focusing on three key aspects.
arXiv Detail & Related papers (2025-03-21T19:29:50Z)
LLM-based event log analysis techniques: A survey [1.6180992915701702]
Event logs record key information on activities that occur on computing devices.<n>Researchers have developed automated techniques to improve the event log analysis process.<n>This paper aims to survey LLM-based event log analysis techniques.
arXiv Detail & Related papers (2025-02-02T05:28:17Z)
Designing Algorithms Empowered by Language Models: An Analytical Framework, Case Studies, and Insights [86.06371692309972]
This work presents an analytical framework for the design and analysis of large language models (LLMs)-based algorithms.<n>Our proposed framework serves as an attempt to mitigate such headaches.
arXiv Detail & Related papers (2024-07-20T07:39:07Z)
Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks. However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs. We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z)
A Case Study on Test Case Construction with Large Language Models: Unveiling Practical Insights and Challenges [2.7029792239733914]
This paper examines the application of Large Language Models in the construction of test cases within the context of software engineering. Through a blend of qualitative and quantitative analyses, this study assesses the impact of LLMs on test case comprehensiveness, accuracy, and efficiency.
arXiv Detail & Related papers (2023-12-19T20:59:02Z)
Automatic Feasibility Study via Data Quality Analysis for ML: A Case-Study on Label Noise [21.491392581672198]
We present Snoopy, with the goal of supporting data scientists and machine learning engineers performing a systematic and theoretically founded feasibility study. We approach this problem by estimating the irreducible error of the underlying task, also known as the Bayes error rate (BER) We demonstrate in end-to-end experiments how users are able to save substantial labeling time and monetary efforts.
arXiv Detail & Related papers (2020-10-16T14:21:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.