Auto-repair without test cases: How LLMs fix compilation errors in large industrial embedded code
- URL: http://arxiv.org/abs/2510.13575v1
- Date: Wed, 15 Oct 2025 14:13:13 GMT
- Title: Auto-repair without test cases: How LLMs fix compilation errors in large industrial embedded code
- Authors: Han Fu, Sigrid Eldh, Kristian Wiklund, Andreas Ermedahl, Philipp Haller, Cyrille Artho,
- Abstract summary: We employ an automated repair approach for compilation errors driven by large language models (LLMs)<n>Our study encompasses the collection of more than 40000 commits from the product's source code.
- Score: 2.64399132991614
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The co-development of hardware and software in industrial embedded systems frequently leads to compilation errors during continuous integration (CI). Automated repair of such failures is promising, but existing techniques rely on test cases, which are not available for non-compilable code. We employ an automated repair approach for compilation errors driven by large language models (LLMs). Our study encompasses the collection of more than 40000 commits from the product's source code. We assess the performance of an industrial CI system enhanced by four state-of-the-art LLMs, comparing their outcomes with manual corrections provided by human programmers. LLM-equipped CI systems can resolve up to 63 % of the compilation errors in our baseline dataset. Among the fixes associated with successful CI builds, 83 % are deemed reasonable. Moreover, LLMs significantly reduce debugging time, with the majority of successful cases completed within 8 minutes, compared to hours typically required for manual debugging.
Related papers
- Context-Guided Decompilation: A Step Towards Re-executability [50.71992919223209]
Binary decompilation plays an important role in software security analysis, reverse engineering and malware understanding.<n>Recent advances in large language models (LLMs) have enabled neural decompilation, but the generated code is typically only semantically plausible.<n>We propose ICL4Decomp, a hybrid decompilation framework that leverages in-context learning (ICL) to guide LLMs toward generating re-executable source code.
arXiv Detail & Related papers (2025-11-03T17:21:39Z) - RepoDebug: Repository-Level Multi-Task and Multi-Language Debugging Evaluation of Large Language Models [49.83481415540291]
Large Language Models (LLMs) have exhibited significant proficiency in code debug.<n>This paper introduces Repo Debug, a multi-task and multi-language repository-level code debug dataset.<n>We conduct evaluation experiments on 10 LLMs, where Claude 3.5 Sonnect, the best-performing model, still cannot perform well in repository-level debug.
arXiv Detail & Related papers (2025-09-04T10:13:21Z) - Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs [79.74676890436174]
We present an APR tool for Dafny that uses formal specifications as oracles for fault localization and repair.<n>We localize faults through a series of steps, which include using Hoare logic to determine the state of each statement within the program.<n>Our tool achieves 89.6% fault localization coverage and GPT-4o mini yields the highest repair success rate of 74.18%.
arXiv Detail & Related papers (2025-07-04T15:36:12Z) - Augmenting Large Language Models with Static Code Analysis for Automated Code Quality Improvements [0.36832029288386137]
This study examined code issue detection and revision automation by integrating Large Language Models (LLMs) into software development.<n>A static code analysis framework detects issues such as bugs, vulnerabilities, and code smells within a large-scale software project.<n>Retrieval-augmented generation (RAG) is implemented to enhance the relevance and precision of the revisions.
arXiv Detail & Related papers (2025-06-12T03:39:25Z) - D-LiFT: Improving LLM-based Decompiler Backend via Code Quality-driven Fine-tuning [49.16469288280772]
Decompilers reconstruct human-readable source code from binaries.<n>Despite recent advances, their outputs often suffer from syntactic and semantic errors and remain difficult to read.<n>With the advent of large language models (LLMs), researchers began to explore the potential of LLMs to refine decompiler output.<n>We present D-LIFT, an enhanced decompiler-LLM pipeline with fine-tuned reinforcement learning.
arXiv Detail & Related papers (2025-06-11T19:09:08Z) - LLM4CVE: Enabling Iterative Automated Vulnerability Repair with Large Language Models [9.946058168276744]
Large Language Models (LLM) have opened up the possibility for many software defects to be patched automatically.<n>We propose an iterative pipeline that robustly fixes vulnerable functions in real-world code with high accuracy.<n>We achieve a human-verified quality score of 8.51/10 and an increase in groundtruth code similarity of 20% with Llama 3 70B.
arXiv Detail & Related papers (2025-01-07T00:21:42Z) - EDA-Aware RTL Generation with Large Language Models [0.7831852829409273]
Large Language Models (LLMs) have become increasingly popular for generating RTL code.<n> producing error-free RTL code in a zero-shot setting remains highly challenging for even state-of-the-art LLMs.<n>We introduce AIvril2, a self-verifying, LLM-agnostic agentic framework aimed at enhancing RTL code generation through iterative corrections of both syntax and functional errors.
arXiv Detail & Related papers (2024-11-21T00:37:51Z) - ToolScan: A Benchmark for Characterizing Errors in Tool-Use LLMs [77.79172008184415]
TOOLSCAN is a new benchmark to identify error patterns in LLM output on tool-use tasks.<n>We show that even the most prominent LLMs exhibit these error patterns in their outputs.<n>Researchers can use these insights from TOOLSCAN to guide their error mitigation strategies.
arXiv Detail & Related papers (2024-11-20T18:56:22Z) - MEIC: Re-thinking RTL Debug Automation using LLMs [18.964523115622928]
This work introduces a novel framework, Make Each Iteration Count (MEIC)
MEIC is suitable for identifying and correcting both syntax and function errors.
To evaluate our framework, we provide an open-source dataset comprising 178 common RTL programming errors.
arXiv Detail & Related papers (2024-05-10T22:32:39Z) - DeepCode AI Fix: Fixing Security Vulnerabilities with Large Language
Models [3.1690235522182104]
Large language models (LLMs) are increasingly used to solve various programming tasks.
We show that the task is difficult as it requires the model to learn long-range code relationships.
We propose a technique to address these challenges with a new approach for querying and fine-tuning LLMs.
arXiv Detail & Related papers (2024-02-19T18:35:40Z) - DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs)
It covers four major bug categories and 18 minor types in C++, Java, and Python.
We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.