Lila: Decentralized Build Reproducibility Monitoring for the Functional Package Management Model
- URL: http://arxiv.org/abs/2601.20662v1
- Date: Wed, 28 Jan 2026 14:44:23 GMT
- Title: Lila: Decentralized Build Reproducibility Monitoring for the Functional Package Management Model
- Authors: Julien Malka, Arnout Engelen,
- Abstract summary: Large-scale adoption of software artifacts faces significant challenges.<n> achieving high distribution rates and establishing monitoring infrastructure.<n>Lila enables assessment tailored to the distributed functional decentralized system management model.
- Score: 4.010598744735379
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ensuring the integrity of software build artifacts is an increasingly important concern for modern software engineering, driven by increasingly sophisticated attacks on build systems, distribution channels, and development infrastructures. Reproducible builds $\unicode{x2013}$ where binaries built independently from the same source code can be verified to be bit-for-bit identical to the distributed artifacts $\unicode{x2013}$ provide a principled foundation for transparency and trust in software distribution. Despite their potential, the large-scale adoption of reproducible builds faces two significant challenges: achieving high reproducibility rates across vast software collections and establishing reproducibility monitoring infrastructure that can operate at very large scale. While recent studies have shown that high reproducibility rates are achievable at scale $\unicode{x2013}$ demonstrated by the Nix ecosystem achieving over 90% reproducibility on more than 80,000 packages $\unicode{x2013}$ the problem of effective reproducibility monitoring remains largely unsolved. In this work, we address the reproducibility monitoring challenge by introducing Lila, a decentralized system for reproducibility assessment tailored to the functional package management model. Lila enables distributed reporting of build results and aggregation into a reproducibility database, benefiting both practitioners and future empirical build reproducibility studies.
Related papers
- Architecture-Aware Multi-Design Generation for Repository-Level Feature Addition [53.50448142467294]
RAIM is a multi-design and architecture-aware framework for repository-level feature addition.<n>It shifts away from linear patching by generating multiple diverse implementation designs.<n>Experiments on the NoCode-bench Verified dataset demonstrate that RAIM establishes a new state-of-the-art performance.
arXiv Detail & Related papers (2026-03-02T12:50:40Z) - CVeDRL: An Efficient Code Verifier via Difficulty-aware Reinforcement Learning [57.24524263804788]
Code verifiers play a critical role in post-verification for LLM-based code generation.<n>Existing supervised fine-tuning methods suffer from data scarcity, high failure rates, and poor inference efficiency.<n>We show that naive RL with only functionality rewards fails to generate effective unit tests for difficult branches and samples.
arXiv Detail & Related papers (2026-01-30T10:33:29Z) - NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents [79.29376673236142]
Existing benchmarks fail to rigorously evaluate the long-horizon capabilities required to build complete software systems.<n>We present NL2Repo Bench, a benchmark explicitly designed to evaluate the long-horizon repository generation ability of coding agents.
arXiv Detail & Related papers (2025-12-14T15:12:13Z) - A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural Networks [48.83701310501069]
We introduce NN-RAG, a retrieval-augmented generation system that converts large, heterogeneous PyTorchs into a searchable library of validated neural modules.<n>Applying to 19 major repositories, the pipeline extracted 1,289 candidate blocks, validated 941 (73.0%), and demonstrated that over 80% are structurally unique.
arXiv Detail & Related papers (2025-12-03T23:28:30Z) - Tractable Asymmetric Verification for Large Language Models via Deterministic Replicability [0.6117371161379209]
The landscape of Large Language Models (LLMs) shifts rapidly towards dynamic, multi-agent systems.<n>This paper proposes a verification framework that achieves tractable asymmetric effort.<n>We show that targeted verification can be over 12 times faster than full regeneration.
arXiv Detail & Related papers (2025-09-14T03:30:06Z) - Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers [103.4410890572479]
We introduce the Loong Project: an open-source framework for scalable synthetic data generation and verification.<n>LoongBench is a curated seed dataset containing 8,729 human-vetted examples across 12 domains.<n>LoongEnv is a modular synthetic data generation environment that supports multiple prompting strategies to produce new question-answer-code triples.
arXiv Detail & Related papers (2025-09-03T06:42:40Z) - From Articles to Code: On-Demand Generation of Core Algorithms from Scientific Publications [0.0]
We show that rich method descriptions in scientific publications can serve as standalone specifications for modern large language models.<n>We benchmark state-of-the-art models by tasking them with implementing a diverse set of core algorithms drawn from original publications.
arXiv Detail & Related papers (2025-07-30T01:52:01Z) - How Far Are We from Generating Missing Modalities with Foundation Models? [49.425856207329524]
We propose an agentic framework tailored for missing modality reconstruction.<n>Our method reduces FID for missing image reconstruction by at least 14% and MER for missing text reconstruction by at least 10% compared to baselines.
arXiv Detail & Related papers (2025-06-04T03:22:44Z) - Causes and Canonicalization of Unreproducible Builds in Java [11.155099138622148]
We introduce a conceptual framework for reproducible builds, analyze a large dataset from Reproducible Central, and develop a novel taxonomy of six root causes of unreproducibility.<n>We present Chains-Rebuild, a tool that achieve successfulcanonicalization for 26.60% on 12,803 unreproducible artifacts.
arXiv Detail & Related papers (2025-04-30T14:17:54Z) - Does Functional Package Management Enable Reproducible Builds at Scale? Yes [4.492444446637857]
Reproducible Builds (R-B) guarantee that rebuilding a software package from source leads to bitwise identical artifacts.<n>We perform the first large-scale study of bitwise in the context of the Nix functional package manager.<n>We obtain very high bitwise rates, between 69 and 91% with an upward trend, and even higher rebuildability rates, over 99%.
arXiv Detail & Related papers (2025-01-27T10:11:27Z) - OriGen:Enhancing RTL Code Generation with Code-to-Code Augmentation and Self-Reflection [54.775409528658486]
OriGen is a fully open-source framework that incorporates self-reflection capabilities and a novel dataset augmentation methodology.
Our approach employs a code-tocode augmentation technique to enhance the quality of open-source RTL code datasets.
arXiv Detail & Related papers (2024-07-23T07:22:25Z) - MMRNet: Improving Reliability for Multimodal Object Detection and
Segmentation for Bin Picking via Multimodal Redundancy [68.7563053122698]
We propose a reliable object detection and segmentation system with MultiModal Redundancy (MMRNet)
This is the first system that introduces the concept of multimodal redundancy to address sensor failure issues during deployment.
We present a new label-free multi-modal consistency (MC) score that utilizes the output from all modalities to measure the overall system output reliability and uncertainty.
arXiv Detail & Related papers (2022-10-19T19:15:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.