On the Variability of Source Code in Maven Package Rebuilds
- URL: http://arxiv.org/abs/2602.19383v1
- Date: Sun, 22 Feb 2026 23:31:42 GMT
- Title: On the Variability of Source Code in Maven Package Rebuilds
- Authors: Jens Dietrich, Behnaz Hassanshahi,
- Abstract summary: We study non-equivalent sources for alternative builds of 28 popular packages with 85 releases.<n>We find that the main cause is build extensions that generate code at build time, which are difficult to reproduce.
- Score: 0.7297857358723842
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Rebuilding packages from open source is a common practice to improve the security of software supply chains, and is now done at an industrial scale. The basic principle is to acquire the source code used to build a package published in a repository such as Maven Central (for Java), rebuild the package independently with hardened security, and publish it in some alternative repository. In this paper we test the assumption that the same source code is being used by those alternative builds. To study this, we compare the sources released with packages on Maven Central, with the sources associated with independently built packages from Google's Assured Open Source and Oracle's Build-from-Source projects. We study non-equivalent sources for alternative builds of 28 popular packages with 85 releases. We investigate the causes of non-equivalence, and find that the main cause is build extensions that generate code at build time, which are difficult to reproduce. We suggest strategies to address this issue.
Related papers
- Maven-Lockfile: High Integrity Rebuild of Past Java Releases [8.004632448033531]
Maven is one of the most important package managers in the Java ecosystem.<n>We present Maven-Lockfile to generate and update lockfiles with support for rebuilding projects from past versions.<n>Our evaluation shows that Maven-Lockfile can reproduce builds from historical commits and is able to detect tampered artifacts.
arXiv Detail & Related papers (2025-10-01T10:14:32Z) - Unlocking Reproducibility: Automating re-Build Process for Open-Source Software [0.06124773188525717]
Software ecosystems like Maven Central play a crucial role in modern software supply chains.<n>Approximately 84% of the top 1200 commonly used artifacts are not built using a transparent CI/CD pipeline.<n>We introduce an extension to Maven, an industry-grade open-source supply chain security framework, to automate the rebuilding of Maven artifacts from source.
arXiv Detail & Related papers (2025-09-10T00:23:08Z) - Causes and Canonicalization of Unreproducible Builds in Java [11.155099138622148]
We introduce a conceptual framework for reproducible builds, analyze a large dataset from Reproducible Central, and develop a novel taxonomy of six root causes of unreproducibility.<n>We present Chains-Rebuild, a tool that achieve successfulcanonicalization for 26.60% on 12,803 unreproducible artifacts.
arXiv Detail & Related papers (2025-04-30T14:17:54Z) - Local Software Buildability across Java Versions (Registered Report) [0.0]
We will try to automatically build every project in containers with Java versions 6 to 23 installed.
Success or failure will be determined by exit codes, and standard output and error streams will be saved.
arXiv Detail & Related papers (2024-08-21T11:51:00Z) - Maven-Hijack: Software Supply Chain Attack Exploiting Packaging Order [9.51794475707891]
We present Maven-Hijack, a novel attack that exploits the order in which Maven packages dependencies.<n>By injecting a malicious class with the same fully qualified name as a legitimate one into a dependency that is packaged earlier, an attacker can silently override core application behavior.<n>We evaluate three mitigation strategies, such as sealed JARs, Java Modules, and the Maven Enforcer plugin.
arXiv Detail & Related papers (2024-07-26T14:17:47Z) - Analyzing the Accessibility of GitHub Repositories for PyPI and NPM Libraries [91.97201077607862]
Industrial applications heavily rely on open-source software (OSS) libraries, which provide various benefits.<n>To monitor the activities of such communities, a comprehensive list of repositories for the libraries of an ecosystem must be accessible.<n>In this study, we analyze the accessibility of GitHub repositories for PyPI and NPM libraries.
arXiv Detail & Related papers (2024-04-26T13:27:04Z) - depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers [92.13613958373628]
textttdepyf is a tool designed to demystify the inner workings of the PyTorch compiler.
textttdepyf decompiles bytecode generated by PyTorch back into equivalent source code.
arXiv Detail & Related papers (2024-03-14T16:17:14Z) - On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository.
We retrieve over 53k potential vulnerable clones from Maven Central.
We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z) - Code Librarian: A Software Package Recommendation System [65.05559087332347]
We present a recommendation engine called Librarian for open source libraries.
A candidate library package is recommended for a given context if: 1) it has been frequently used with the imported libraries in the program; 2) it has similar functionality to the imported libraries in the program; 3) it has similar functionality to the developer's implementation, and 4) it can be used efficiently in the context of the provided code.
arXiv Detail & Related papers (2022-10-11T12:30:05Z) - Repro: An Open-Source Library for Improving the Reproducibility and
Usability of Publicly Available Research Code [74.28810048824519]
Repro is an open-source library which aims at improving the usability of research code.
It provides a lightweight Python API for running software released by researchers within Docker containers.
arXiv Detail & Related papers (2022-04-29T01:54:54Z) - Towards Utility-based Prioritization of Requirements in Open Source
Environments [51.65930505153647]
We show how utility-based prioritization approaches can be used to support contributors in conventional and open source Requirements Engineering scenarios.
As an example, we show how dependencies can be taken into account in utility-based prioritization processes.
arXiv Detail & Related papers (2021-02-17T09:05:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.