Unlocking Reproducibility: Automating re-Build Process for Open-Source Software
- URL: http://arxiv.org/abs/2509.08204v1
- Date: Wed, 10 Sep 2025 00:23:08 GMT
- Title: Unlocking Reproducibility: Automating re-Build Process for Open-Source Software
- Authors: Behnaz Hassanshahi, Trong Nhan Mai, Benjamin Selwyn Smith, Nicholas Allen,
- Abstract summary: Software ecosystems like Maven Central play a crucial role in modern software supply chains.<n>Approximately 84% of the top 1200 commonly used artifacts are not built using a transparent CI/CD pipeline.<n>We introduce an extension to Maven, an industry-grade open-source supply chain security framework, to automate the rebuilding of Maven artifacts from source.
- Score: 0.06124773188525717
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Software ecosystems like Maven Central play a crucial role in modern software supply chains by providing repositories for libraries and build plugins. However, the separation between binaries and their corresponding source code in Maven Central presents a significant challenge, particularly when it comes to linking binaries back to their original build environment. This lack of transparency poses security risks, as approximately 84% of the top 1200 commonly used artifacts are not built using a transparent CI/CD pipeline. Consequently, users must place a significant amount of trust not only in the source code but also in the environment in which these artifacts are built. Rebuilding software artifacts from source provides a robust solution to improve supply chain security. This approach allows for a deeper review of code, verification of binary-source equivalence, and control over dependencies. However, challenges arise due to variations in build environments, such as JDK versions and build commands, which can lead to build failures. Additionally, ensuring that all dependencies are rebuilt from source across large and complex dependency graphs further complicates the process. In this paper, we introduce an extension to Macaron, an industry-grade open-source supply chain security framework, to automate the rebuilding of Maven artifacts from source. Our approach improves upon existing tools, by offering better performance in source code detection and automating the extraction of build specifications from GitHub Actions workflows. We also present a comprehensive root cause analysis of build failures in Java projects and propose a scalable solution to automate the rebuilding of artifacts, ultimately enhancing security and transparency in the open-source supply chain.
Related papers
- On the Variability of Source Code in Maven Package Rebuilds [0.7297857358723842]
We study non-equivalent sources for alternative builds of 28 popular packages with 85 releases.<n>We find that the main cause is build extensions that generate code at build time, which are difficult to reproduce.
arXiv Detail & Related papers (2026-02-22T23:31:42Z) - ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development [72.4729759618632]
We introduce ABC-Bench, a benchmark to evaluate agentic backend coding within a realistic, executable workflow.<n>We curated 224 practical tasks spanning 8 languages and 19 frameworks from open-source repositories.<n>Our evaluation reveals that even state-of-the-art models struggle to deliver reliable performance on these holistic tasks.
arXiv Detail & Related papers (2026-01-16T08:23:52Z) - Context-Guided Decompilation: A Step Towards Re-executability [50.71992919223209]
Binary decompilation plays an important role in software security analysis, reverse engineering and malware understanding.<n>Recent advances in large language models (LLMs) have enabled neural decompilation, but the generated code is typically only semantically plausible.<n>We propose ICL4Decomp, a hybrid decompilation framework that leverages in-context learning (ICL) to guide LLMs toward generating re-executable source code.
arXiv Detail & Related papers (2025-11-03T17:21:39Z) - Code2MCP: A Multi-Agent Framework for Automated Transformation of Code Repositories into Model Context Protocol Services [49.5217775646447]
This paper introduces Code2MCP, a highly automated framework designed to transform any GitHub repository into a functional MCP service.<n>A key innovation of our framework is an LLM-driven, closed-loop "Run--Review--Fix" cycle, which enables the system to autonomously debug and repair the code it generates.
arXiv Detail & Related papers (2025-09-07T06:13:25Z) - A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code [48.10068691540979]
A.S.E (AI Code Generation Security Evaluation) is a repository-level evaluation benchmark designed to closely mirror real-world AI programming tasks.<n>Our evaluation of leading large language models (LLMs) on A.S.E reveals several key findings.
arXiv Detail & Related papers (2025-08-25T15:11:11Z) - Attestable builds: compiling verifiable binaries on untrusted systems using trusted execution environments [3.207381224848367]
attestable builds provide strong source-to-binary correspondence in software artifacts.<n>We tackle the challenge of opaque build pipelines that disconnect the trust between source code and the final binary artifact.
arXiv Detail & Related papers (2025-05-05T10:00:04Z) - Canonicalization for Unreproducible Builds in Java [11.367562045401554]
We introduce a conceptual framework for reproducible builds, analyze a large dataset from Reproducible Central, and develop a novel taxonomy of six root causes of unreproducibility.<n>We present Chains-Rebuild, a tool that raises success from 9.48% to 26.89% on 12,283 unreproducible artifacts.
arXiv Detail & Related papers (2025-04-30T14:17:54Z) - Wolves in the Repository: A Software Engineering Analysis of the XZ Utils Supply Chain Attack [0.8517406772939294]
The digital economy runs on Open Source Software (OSS), with an estimated 90% of modern applications containing open-source components.<n>This paper examines a sophisticated attack on the XZUtils project (-2024-3094), where attackers exploited not just code, but the entire open-source development process.<n>Our analysis reveals a new breed of supply chain attack that manipulates software engineering practices themselves.
arXiv Detail & Related papers (2025-04-24T12:06:11Z) - Bomfather: An eBPF-based Kernel-level Monitoring Framework for Accurate Identification of Unknown, Unused, and Dynamically Loaded Dependencies in Modern Software Supply Chains [0.0]
Inaccuracies in dependency-tracking methods undermine the security and integrity of modern software supply chains.<n>This paper introduces a kernel-level framework leveraging extended Berkeley Packet Filter (eBPF) to capture software build dependencies transparently in real time.
arXiv Detail & Related papers (2025-03-03T22:32:59Z) - ReF Decompile: Relabeling and Function Call Enhanced Decompile [50.86228893636785]
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages.<n>This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration.
arXiv Detail & Related papers (2025-02-17T12:38:57Z) - Automatic Bill of Materials [5.14387789987357]
ABOM embeds a hash of each distinct input source code file into the binary emitted by a compiler.
If leveraged across the ecosystem, ABOMs provide a zero-touch, backwards-compatible, drop-in solution for fast supply chain attack detection.
arXiv Detail & Related papers (2023-10-15T05:48:11Z) - Analyzing Maintenance Activities of Software Libraries [55.2480439325792]
Industrial applications heavily integrate open-source software libraries nowadays.<n>I want to introduce an automatic monitoring approach for industrial applications to identify open-source dependencies that show negative signs regarding their current or future maintenance activities.
arXiv Detail & Related papers (2023-06-09T16:51:25Z) - On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository.
We retrieve over 53k potential vulnerable clones from Maven Central.
We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.