PhantomRun: Auto Repair of Compilation Errors in Embedded Open Source Software
- URL: http://arxiv.org/abs/2602.20284v1
- Date: Mon, 23 Feb 2026 19:13:22 GMT
- Title: PhantomRun: Auto Repair of Compilation Errors in Embedded Open Source Software
- Authors: Han Fu, Andreas Ermedahl, Sigrid Eldh, Kristian Wiklund, Philipp Haller, Cyrille Artho,
- Abstract summary: We study four major open-source embedded system projects, spanning over 4000 build failures from the project's CI runs.<n>We find that hardware dependencies account for the majority of compilation failures, followed by syntax errors and build-script issues.<n>We present PhantomRun, an automated framework that leverages large language models (LLMs) to generate and validate fixes for CI compilation failures.
- Score: 2.64399132991614
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continuous Integration (CI) pipelines for embedded software sometimes fail during compilation, consuming significant developer time for debugging. We study four major open-source embedded system projects, spanning over 4000 build failures from the project's CI runs. We find that hardware dependencies account for the majority of compilation failures, followed by syntax errors and build-script issues. Most repairs need relatively small changes, making automated repair potentially suitable as long as the diverse setups and lack of test data can be handled. In this paper, we present PhantomRun, an automated framework that leverages large language models (LLMs) to generate and validate fixes for CI compilation failures. The framework addresses the challenge of diverse build infrastructures and tool chains across embedded system projects by providing an adaptation layer for GitHub Actions and GitLab CI and four different build systems. PhantomRun utilizes build logs, source code, historical fixes, and compiler error messages to synthesize fixes using LLMs. Our evaluations show that PhantomRun successfully repairs up to 45% of CI compilation failures across the targeted projects, demonstrating the viability of LLM-based repairs for embedded-system CI pipelines.
Related papers
- Auto-repair without test cases: How LLMs fix compilation errors in large industrial embedded code [2.64399132991614]
We employ an automated repair approach for compilation errors driven by large language models (LLMs)<n>Our study encompasses the collection of more than 40000 commits from the product's source code.
arXiv Detail & Related papers (2025-10-15T14:13:13Z) - Where LLM Agents Fail and How They can Learn From Failures [62.196870049524364]
Large Language Model (LLM) agents have shown promise in solving complex, multi-step tasks.<n>They amplify vulnerability to cascading failures, where a single root-cause error propagates through subsequent decisions.<n>Current systems lack a framework that can comprehensively understand agent error in a modular and systemic way.<n>We introduce the AgentErrorTaxonomy, a modular classification of failure modes spanning memory, reflection, planning, action, and system-level operations.
arXiv Detail & Related papers (2025-09-29T18:20:27Z) - SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving [90.32201622392137]
We present SwingArena, a competitive evaluation framework for Large Language Models (LLMs)<n>Unlike traditional static benchmarks, SwingArena models the collaborative process of software by pairing LLMs as iterations, who generate patches, and reviewers, who create test cases and verify the patches through continuous integration (CI) pipelines.
arXiv Detail & Related papers (2025-05-29T18:28:02Z) - CXXCrafter: An LLM-Based Agent for Automated C/C++ Open Source Software Building [14.687126587793028]
C/C++ projects often proves to be difficult in practice, hindering the progress of downstream applications.<n>We develop an automated build system called CXXCrafter to address the challenges, such as dependency resolution.<n>Our evaluation on open-source software demonstrates that CXXCrafter achieves a success rate of 78% in project building.
arXiv Detail & Related papers (2025-05-27T11:54:56Z) - Attestable Builds: Compiling Verifiable Binaries on Untrusted Systems using Trusted Execution Environments [2.4650753804485417]
We present attestable builds, a new paradigm to provide strong source-to-binary correspondence in software artifacts.<n>Our system uses modern trusted execution environments (TEEs) and sandboxed build containers to provide strong guarantees that a given artifact was correctly built from a specific source code snapshot.
arXiv Detail & Related papers (2025-05-05T10:00:04Z) - CrashFixer: A crash resolution agent for the Linux kernel [58.152358195983155]
This work builds upon kGym, which shares a benchmark for system-level Linux kernel bugs and a platform to run experiments on the Linux kernel.<n>This paper introduces CrashFixer, the first LLM-based software repair agent that is applicable to Linux kernel bugs.
arXiv Detail & Related papers (2025-04-29T04:18:51Z) - KGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution [59.20933707301566]
Large Language Models (LLMs) are consistently improving at increasingly realistic software engineering (SE) tasks.
In real-world software stacks, significant SE effort is spent developing foundational system software like the Linux kernel.
To evaluate if ML models are useful while developing such large-scale systems-level software, we introduce kGym and kBench.
arXiv Detail & Related papers (2024-07-02T21:44:22Z) - In industrial embedded software, are some compilation errors easier to localize and fix than others? [1.627308316856397]
We collect over 40000 builds from 4 projects from the product source code and categorized compilation errors into 14 error types.
We show that the five most common ones comprise 89 % of all compilation errors.
Our research also provides insights into the human effort required to fix the most common industrial compilation errors.
arXiv Detail & Related papers (2024-04-23T08:20:18Z) - DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs)
It covers four major bug categories and 18 minor types in C++, Java, and Python.
We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z) - The Devil Is in the Command Line: Associating the Compiler Flags With
the Binary and Build Metadata [0.0]
Defects caused by an undesired combination of compiler flags are common in nontrivial software projects.
queryable database of how the compiler compiled and linked the software system will help to detect defects earlier.
arXiv Detail & Related papers (2023-12-20T22:27:32Z) - Dcc --help: Generating Context-Aware Compiler Error Explanations with
Large Language Models [53.04357141450459]
dcc --help was deployed to our CS1 and CS2 courses, with 2,565 students using the tool over 64,000 times in ten weeks.
We found that the LLM-generated explanations were conceptually accurate in 90% of compile-time and 75% of run-time cases, but often disregarded the instruction not to provide solutions in code.
arXiv Detail & Related papers (2023-08-23T02:36:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.