Detecting Build Dependency Errors in Incremental Builds
- URL: http://arxiv.org/abs/2404.13295v2
- Date: Fri, 26 Apr 2024 02:03:41 GMT
- Title: Detecting Build Dependency Errors in Incremental Builds
- Authors: Jun Lyu, Shanshan Li, He Zhang, Yang Zhang, Guoping Rong, Manuel Rigger,
- Abstract summary: We propose EChecker to detect build dependency errors in the context of incremental builds.
EChecker automatically updates actual build dependencies by inferring them from C/C++ pre-processor directives and Makefile changes from new commits.
EChecker increases the build dependency error detection efficiency by an average of 85.14 times.
- Score: 13.823208277774572
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Incremental and parallel builds performed by build tools such as Make are the heart of modern C/C++ software projects. Their correct and efficient execution depends on build scripts. However, build scripts are prone to errors. The most prevalent errors are missing dependencies (MDs) and redundant dependencies (RDs). The state-of-the-art methods for detecting these errors rely on clean builds (i.e., full builds of a subset of software configurations in a clean environment), which is costly and takes up to multiple hours for large-scale projects. To address these challenges, we propose a novel approach called EChecker to detect build dependency errors in the context of incremental builds. The core idea of EChecker is to automatically update actual build dependencies by inferring them from C/C++ pre-processor directives and Makefile changes from new commits, which avoids clean builds when possible. EChecker achieves higher efficiency than the methods that rely on clean builds while maintaining effectiveness. We selected 12 representative projects, with their sizes ranging from small to large, with 240 commits (20 commits for each project), based on which we evaluated the effectiveness and efficiency of EChecker. We compared the evaluation results with a state-of-the-art build dependency error detection tool. The evaluation shows that the F-1 score of EChecker improved by 0.18 over the state-of-the-art method. EChecker increases the build dependency error detection efficiency by an average of 85.14 times (with the median at 16.30 times). The results demonstrate that EChecker can support practitioners in detecting build dependency errors efficiently.
Related papers
- PhantomRun: Auto Repair of Compilation Errors in Embedded Open Source Software [2.64399132991614]
We study four major open-source embedded system projects, spanning over 4000 build failures from the project's CI runs.<n>We find that hardware dependencies account for the majority of compilation failures, followed by syntax errors and build-script issues.<n>We present PhantomRun, an automated framework that leverages large language models (LLMs) to generate and validate fixes for CI compilation failures.
arXiv Detail & Related papers (2026-02-23T19:13:22Z) - Outrunning LLM Cutoffs: A Live Kernel Crash Resolution Benchmark for All [57.23434868678603]
Live-kBench is an evaluation framework for self-evolving benchmarks that scrapes and evaluates agents on freshly discovered kernel bugs.<n> kEnv is an agent-agnostic crash-resolution environment for kernel compilation, execution, and feedback.<n>Using kEnv, we benchmark three state-of-the-art agents, showing that they resolve 74% of crashes on the first attempt.
arXiv Detail & Related papers (2026-02-02T19:06:15Z) - Understanding and Detecting Flaky Builds in GitHub Actions [6.3850400710838615]
We present a large-scale empirical study of flaky builds in GitHub Actions based on rerun data from 1,960 Java projects.<n>We identify 15 distinct categories of flaky failures, among which flaky tests, network issues, and dependency resolution issues are the most prevalent.<n>We propose a machine learning-based approach for detecting flaky failures at the job level.
arXiv Detail & Related papers (2026-02-02T16:39:56Z) - The Cost of Downgrading Build Systems: A Case Study of Kubernetes [9.25132407333504]
We study a project that downgraded from an artifact-based build tool (Bazel) to a language-specific solution (Go Build)<n>We find that Bazel builds are faster than Go Build, completing full builds in 23.06-38.66 up to 75.19.<n>Downgrading from Bazel can increase CI resource costs by up to 76.
arXiv Detail & Related papers (2025-10-22T21:40:23Z) - Where LLM Agents Fail and How They can Learn From Failures [62.196870049524364]
Large Language Model (LLM) agents have shown promise in solving complex, multi-step tasks.<n>They amplify vulnerability to cascading failures, where a single root-cause error propagates through subsequent decisions.<n>Current systems lack a framework that can comprehensively understand agent error in a modular and systemic way.<n>We introduce the AgentErrorTaxonomy, a modular classification of failure modes spanning memory, reflection, planning, action, and system-level operations.
arXiv Detail & Related papers (2025-09-29T18:20:27Z) - Alignment with Fill-In-the-Middle for Enhancing Code Generation [56.791415642365415]
We propose a novel approach that splits code snippets into smaller, granular blocks, creating more diverse DPO pairs from the same test cases.<n>Our approach demonstrates significant improvements in code generation tasks, as validated by experiments on benchmark datasets such as HumanEval (+), MBPP (+), APPS, LiveCodeBench, and BigCodeBench.
arXiv Detail & Related papers (2025-08-27T03:15:53Z) - Improving Compiler Bug Isolation by Leveraging Large Language Models [14.679589768900621]
We propose an innovative compiler bug isolation approach named AutoCBI.<n>We evaluate AutoCBI against state-of-the-art approaches (DiWi, RecBi and FuseFL) on 120 real-world bugs from the widely-used GCC and LLVM compilers.<n>Specifically, AutoCBI isolates 66.67%/69.23%, 300%/340%, and 100%/57.14% more bugs than RecBi, DiWi, and FuseFL, respectively, in the Top-1 ranked results for GCC/LLVM.
arXiv Detail & Related papers (2025-06-21T09:09:30Z) - Attestable builds: compiling verifiable binaries on untrusted systems using trusted execution environments [3.207381224848367]
attestable builds provide strong source-to-binary correspondence in software artifacts.<n>We tackle the challenge of opaque build pipelines that disconnect the trust between source code and the final binary artifact.
arXiv Detail & Related papers (2025-05-05T10:00:04Z) - CrashFixer: A crash resolution agent for the Linux kernel [58.152358195983155]
This work builds upon kGym, which shares a benchmark for system-level Linux kernel bugs and a platform to run experiments on the Linux kernel.
This paper introduces CrashFixer, the first LLM-based software repair agent that is applicable to Linux kernel bugs.
arXiv Detail & Related papers (2025-04-29T04:18:51Z) - RAG-Based Fuzzing of Cross-Architecture Compilers [0.8302146576157498]
OneAPI is an open standard that supports cross-architecture software development with minimal effort from developers.
OneAPI brings DPC++ and C++ compilers which need to be thoroughly tested to verify their correctness, reliability, and security.
This paper proposes a large-language model (LLM)-based compiler fuzzing tool that integrates the concept of retrieval-augmented generation (RAG)
arXiv Detail & Related papers (2025-04-11T20:46:52Z) - CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification [71.34070740261072]
This paper presents a benchmark, CLOVER, to evaluate models' capabilities in generating and completing test cases.
The benchmark is containerized for code execution across tasks, and we will release the code, data, and construction methodologies.
arXiv Detail & Related papers (2025-02-12T21:42:56Z) - Finding Missed Code Size Optimizations in Compilers using LLMs [1.90019787465083]
We develop a novel testing approach which combines large language models with a series of differential testing strategies.
Our approach requires fewer than 150 lines of code to implement.
To date we have reported 24 confirmed bugs in production compilers.
arXiv Detail & Related papers (2024-12-31T21:47:46Z) - Commit0: Library Generation from Scratch [77.38414688148006]
Commit0 is a benchmark that challenges AI agents to write libraries from scratch.
Agents are provided with a specification document outlining the library's API as well as a suite of interactive unit tests.
Commit0 also offers an interactive environment where models receive static analysis and execution feedback on the code they generate.
arXiv Detail & Related papers (2024-12-02T18:11:30Z) - Local Software Buildability across Java Versions (Registered Report) [0.0]
We will try to automatically build every project in containers with Java versions 6 to 23 installed.
Success or failure will be determined by exit codes, and standard output and error streams will be saved.
arXiv Detail & Related papers (2024-08-21T11:51:00Z) - KGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution [59.20933707301566]
Large Language Models (LLMs) are consistently improving at increasingly realistic software engineering (SE) tasks.
In real-world software stacks, significant SE effort is spent developing foundational system software like the Linux kernel.
To evaluate if ML models are useful while developing such large-scale systems-level software, we introduce kGym and kBench.
arXiv Detail & Related papers (2024-07-02T21:44:22Z) - Does Using Bazel Help Speed Up Continuous Integration Builds? [9.098224117917336]
New artifact-based build technologies like Bazel have built-in support for advanced performance optimizations.
We collected 383 Bazel projects from GitHub, studied their parallel and incremental build usage of Bazel in 4 popular CI services, and compared the results with Maven projects.
Our results show that 31.23% of Bazel projects adopt a CI service but do not use it in the CI service, while for those who do use Bazel in CI, 27.76% of them use other tools to facilitate Bazel's execution.
arXiv Detail & Related papers (2024-05-01T18:16:38Z) - Automatic Build Repair for Test Cases using Incompatible Java Versions [7.4881561767138365]
We introduce an approach to repair test cases of Java projects by performing dependency minimization.
Unlike existing state-of-the-art techniques, our approach performs at source-level, which allows compile-time errors to be fixed.
arXiv Detail & Related papers (2024-04-27T07:55:52Z) - DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs)
It covers four major bug categories and 18 minor types in C++, Java, and Python.
We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z) - Repeated Builds During Code Review: An Empirical Study of the OpenStack
Community [11.289146650622662]
We conduct an empirical study of 66,932 code reviews from the community.
We observe that (i) 55% of code reviews invoke the recheck command after a failing build is reported; (ii) invoking the recheck command only changes the outcome of a failing build in 42% of the cases; and (iii) invoking the recheck command increases review waiting time by an average of 2,200%.
arXiv Detail & Related papers (2023-08-19T17:45:03Z) - On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository.
We retrieve over 53k potential vulnerable clones from Maven Central.
We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z) - A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems.
static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models.
We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - Generating Bug-Fixes Using Pretrained Transformers [11.012132897417592]
We introduce a data-driven program repair approach which learns to detect and fix bugs in Java methods mined from real-world GitHub.
We show that pretraining on source code programs improves the number of patches found by 33% as compared to supervised training from scratch.
We refine the standard accuracy evaluation metric into non-deletion and deletion-only fixes, and show that our best model generates 75% more non-deletion fixes than the previous state of the art.
arXiv Detail & Related papers (2021-04-16T05:27:04Z) - D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using
Differential Analysis [55.15995704119158]
We propose D2A, a differential analysis based approach to label issues reported by static analysis tools.
We use D2A to generate a large labeled dataset to train models for vulnerability identification.
arXiv Detail & Related papers (2021-02-16T07:46:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.