Related papers: Detecting Build Dependency Errors in Incremental Builds

Detecting Build Dependency Errors in Incremental Builds

URL: http://arxiv.org/abs/2404.13295v2
Date: Fri, 26 Apr 2024 02:03:41 GMT
Title: Detecting Build Dependency Errors in Incremental Builds
Authors: Jun Lyu, Shanshan Li, He Zhang, Yang Zhang, Guoping Rong, Manuel Rigger,
Abstract summary: We propose EChecker to detect build dependency errors in the context of incremental builds. EChecker automatically updates actual build dependencies by inferring them from C/C++ pre-processor directives and Makefile changes from new commits. EChecker increases the build dependency error detection efficiency by an average of 85.14 times.
Score: 13.823208277774572
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Incremental and parallel builds performed by build tools such as Make are the heart of modern C/C++ software projects. Their correct and efficient execution depends on build scripts. However, build scripts are prone to errors. The most prevalent errors are missing dependencies (MDs) and redundant dependencies (RDs). The state-of-the-art methods for detecting these errors rely on clean builds (i.e., full builds of a subset of software configurations in a clean environment), which is costly and takes up to multiple hours for large-scale projects. To address these challenges, we propose a novel approach called EChecker to detect build dependency errors in the context of incremental builds. The core idea of EChecker is to automatically update actual build dependencies by inferring them from C/C++ pre-processor directives and Makefile changes from new commits, which avoids clean builds when possible. EChecker achieves higher efficiency than the methods that rely on clean builds while maintaining effectiveness. We selected 12 representative projects, with their sizes ranging from small to large, with 240 commits (20 commits for each project), based on which we evaluated the effectiveness and efficiency of EChecker. We compared the evaluation results with a state-of-the-art build dependency error detection tool. The evaluation shows that the F-1 score of EChecker improved by 0.18 over the state-of-the-art method. EChecker increases the build dependency error detection efficiency by an average of 85.14 times (with the median at 16.30 times). The results demonstrate that EChecker can support practitioners in detecting build dependency errors efficiently.

Related papers

CrashFixer: A crash resolution agent for the Linux kernel [58.152358195983155]
This work builds upon kGym, which shares a benchmark for system-level Linux kernel bugs and a platform to run experiments on the Linux kernel. This paper introduces CrashFixer, the first LLM-based software repair agent that is applicable to Linux kernel bugs.
arXiv Detail & Related papers (2025-04-29T04:18:51Z)
RAG-Based Fuzzing of Cross-Architecture Compilers [0.8302146576157498]
OneAPI is an open standard that supports cross-architecture software development with minimal effort from developers. OneAPI brings DPC++ and C++ compilers which need to be thoroughly tested to verify their correctness, reliability, and security. This paper proposes a large-language model (LLM)-based compiler fuzzing tool that integrates the concept of retrieval-augmented generation (RAG)
arXiv Detail & Related papers (2025-04-11T20:46:52Z)
CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification [71.34070740261072]
This paper presents a benchmark, CLOVER, to evaluate models' capabilities in generating and completing test cases. The benchmark is containerized for code execution across tasks, and we will release the code, data, and construction methodologies.
arXiv Detail & Related papers (2025-02-12T21:42:56Z)
Finding Missed Code Size Optimizations in Compilers using LLMs [1.90019787465083]
We develop a novel testing approach which combines large language models with a series of differential testing strategies. Our approach requires fewer than 150 lines of code to implement. To date we have reported 24 confirmed bugs in production compilers.
arXiv Detail & Related papers (2024-12-31T21:47:46Z)
Commit0: Library Generation from Scratch [77.38414688148006]
Commit0 is a benchmark that challenges AI agents to write libraries from scratch. Agents are provided with a specification document outlining the library's API as well as a suite of interactive unit tests. Commit0 also offers an interactive environment where models receive static analysis and execution feedback on the code they generate.
arXiv Detail & Related papers (2024-12-02T18:11:30Z)
Local Software Buildability across Java Versions (Registered Report) [0.0]
We will try to automatically build every project in containers with Java versions 6 to 23 installed. Success or failure will be determined by exit codes, and standard output and error streams will be saved.
arXiv Detail & Related papers (2024-08-21T11:51:00Z)
KGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution [59.20933707301566]
Large Language Models (LLMs) are consistently improving at increasingly realistic software engineering (SE) tasks. In real-world software stacks, significant SE effort is spent developing foundational system software like the Linux kernel. To evaluate if ML models are useful while developing such large-scale systems-level software, we introduce kGym and kBench.
arXiv Detail & Related papers (2024-07-02T21:44:22Z)
Does Using Bazel Help Speed Up Continuous Integration Builds? [9.098224117917336]
New artifact-based build technologies like Bazel have built-in support for advanced performance optimizations. We collected 383 Bazel projects from GitHub, studied their parallel and incremental build usage of Bazel in 4 popular CI services, and compared the results with Maven projects. Our results show that 31.23% of Bazel projects adopt a CI service but do not use it in the CI service, while for those who do use Bazel in CI, 27.76% of them use other tools to facilitate Bazel's execution.
arXiv Detail & Related papers (2024-05-01T18:16:38Z)
Automatic Build Repair for Test Cases using Incompatible Java Versions [7.4881561767138365]
We introduce an approach to repair test cases of Java projects by performing dependency minimization. Unlike existing state-of-the-art techniques, our approach performs at source-level, which allows compile-time errors to be fixed.
arXiv Detail & Related papers (2024-04-27T07:55:52Z)
DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs) It covers four major bug categories and 18 minor types in C++, Java, and Python. We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z)
Repeated Builds During Code Review: An Empirical Study of the OpenStack Community [11.289146650622662]
We conduct an empirical study of 66,932 code reviews from the community. We observe that (i) 55% of code reviews invoke the recheck command after a failing build is reported; (ii) invoking the recheck command only changes the outcome of a failing build in 42% of the cases; and (iii) invoking the recheck command increases review waiting time by an average of 2,200%.
arXiv Detail & Related papers (2023-08-19T17:45:03Z)
On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository. We retrieve over 53k potential vulnerable clones from Maven Central. We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z)
A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems. static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models. We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z)
Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation. We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z)
Generating Bug-Fixes Using Pretrained Transformers [11.012132897417592]
We introduce a data-driven program repair approach which learns to detect and fix bugs in Java methods mined from real-world GitHub. We show that pretraining on source code programs improves the number of patches found by 33% as compared to supervised training from scratch. We refine the standard accuracy evaluation metric into non-deletion and deletion-only fixes, and show that our best model generates 75% more non-deletion fixes than the previous state of the art.
arXiv Detail & Related papers (2021-04-16T05:27:04Z)
D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using Differential Analysis [55.15995704119158]
We propose D2A, a differential analysis based approach to label issues reported by static analysis tools. We use D2A to generate a large labeled dataset to train models for vulnerability identification.
arXiv Detail & Related papers (2021-02-16T07:46:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.