Related papers: Reproducibility of Build Environments through Space and Time

Related papers

Canonicalization for Unreproducible Builds in Java [11.367562045401554]
We introduce a conceptual framework for reproducible builds, analyze a large dataset from Reproducible Central, and develop a novel taxonomy of six root causes of unreproducibility. We present Chains-Rebuild, a tool that raises success from 9.48% to 26.89% on 12,283 unreproducible artifacts.
arXiv Detail & Related papers (2025-04-30T14:17:54Z)
Towards Source Mapping for Zero-Knowledge Smart Contracts: Design and Preliminary Evaluation [9.952399779710044]
We present a source mapping framework that establishes traceability between Solidity source code, LLVM IR, and zkEVM bytecode within the zkSolc compilation pipeline. We evaluate the framework on a dataset of 50 benchmark contracts and 500 real-world zkSync contracts, observing a mapping accuracy of approximately 97.2% for standard Solidity constructs.
arXiv Detail & Related papers (2025-04-06T01:42:07Z)
Insights into Dependency Maintenance Trends in the Maven Ecosystem [0.14999444543328289]
We present a quantitative analysis of the Neo4j dataset using the Goblin framework. Our analysis reveals that releases with fewer dependencies have a higher number of missed releases. Our study shows that the dependencies in the latest releases have positive freshness scores, indicating better software management efficacy.
arXiv Detail & Related papers (2025-03-28T22:20:24Z)
EnvBench: A Benchmark for Automated Environment Setup [76.02998475135824]
Large Language Models have enabled researchers to focus on practical repository-level tasks in software engineering domain. Existing studies on environment setup introduce innovative agentic strategies, but their evaluation is often based on small datasets. To address this gap, we introduce a comprehensive environment setup benchmark EnvBench.
arXiv Detail & Related papers (2025-03-18T17:19:12Z)
Does Functional Package Management Enable Reproducible Builds at Scale? Yes [4.492444446637857]
Reproducible Builds (R-B) guarantee that rebuilding a software package from source leads to bitwise identical artifacts. We perform the first large-scale study of bitwise in the context of the Nix functional package manager. We obtain very high bitwise rates, between 69 and 91% with an upward trend, and even higher rebuildability rates, over 99%.
arXiv Detail & Related papers (2025-01-27T10:11:27Z)
ExecRepoBench: Multi-level Executable Code Completion Evaluation [45.963424627710765]
We introduce a novel framework for enhancing code completion in software development through the creation of a repository-level benchmark ExecRepoBench. We present a multi-level grammar-based completion methodology conditioned on the abstract syntax tree to mask code fragments at various logical units. Then, we fine-tune the open-source LLM with 7B parameters on Repo-Instruct to produce a strong code completion baseline model Qwen2.5-Coder-Instruct-C.
arXiv Detail & Related papers (2024-12-16T17:14:35Z)
Commit0: Library Generation from Scratch [77.38414688148006]
Commit0 is a benchmark that challenges AI agents to write libraries from scratch. Agents are provided with a specification document outlining the library's API as well as a suite of interactive unit tests. Commit0 also offers an interactive environment where models receive static analysis and execution feedback on the code they generate.
arXiv Detail & Related papers (2024-12-02T18:11:30Z)
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z)
Designing and Implementing a Generator Framework for a SIMD Abstraction Library [53.84310825081338]
We present TSLGen, a novel end-to-end framework for generating an SIMD abstraction library. We show that our framework is comparable to existing libraries, and we achieve the same performance results.
arXiv Detail & Related papers (2024-07-26T13:25:38Z)
Does Using Bazel Help Speed Up Continuous Integration Builds? [9.098224117917336]
New artifact-based build technologies like Bazel have built-in support for advanced performance optimizations. We collected 383 Bazel projects from GitHub, studied their parallel and incremental build usage of Bazel in 4 popular CI services, and compared the results with Maven projects. Our results show that 31.23% of Bazel projects adopt a CI service but do not use it in the CI service, while for those who do use Bazel in CI, 27.76% of them use other tools to facilitate Bazel's execution.
arXiv Detail & Related papers (2024-05-01T18:16:38Z)
Class-Level Code Generation from Natural Language Using Iterative, Tool-Enhanced Reasoning over Repository [4.767858874370881]
We introduce RepoClassBench, a benchmark designed to rigorously evaluate LLMs in generating class-level code within real-world repositories. RepoClassBench includes "Natural Language to Class generation" tasks across Java, Python & C# from a selection of repositories. We introduce Retrieve-Repotools-Reflect (RRR), a novel approach that equips LLMs with static analysis tools to iteratively navigate & reason about repository-level context.
arXiv Detail & Related papers (2024-04-22T03:52:54Z)
DevEval: Evaluating Code Generation in Practical Software Projects [52.16841274646796]
We propose a new benchmark named DevEval, aligned with Developers' experiences in practical projects. DevEval is collected through a rigorous pipeline, containing 2,690 samples from 119 practical projects. We assess five popular LLMs on DevEval and reveal their actual abilities in code generation.
arXiv Detail & Related papers (2024-01-12T06:51:30Z)
CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization [62.0397906276669]
CLIN is the first language-based agent to continually improve over multiple trials. It can improve its zero-shot performance by 4 points (13 for new tasks) and can further improve performance there through continual memory updates. This suggests a new architecture for agents built on frozen models that can still continually and rapidly improve over time.
arXiv Detail & Related papers (2023-10-16T07:17:27Z)
WebArena: A Realistic Web Environment for Building Autonomous Agents [92.3291458543633]
We build an environment for language-guided agents that is highly realistic and reproducible. We focus on agents that perform tasks on the web, and create an environment with fully functional websites from four common domains. We release a set of benchmark tasks focusing on evaluating the functional correctness of task completions.
arXiv Detail & Related papers (2023-07-25T22:59:32Z)
Analyzing the Evolution of Inter-package Dependencies in Operating Systems: A Case Study of Ubuntu [7.76541950830141]
An Operating System (OS) combines multiple interdependent software packages, which usually have their own independently developed architectures. For an evolutionary effort, designers/developers of OS can greatly benefit from fully understanding the system-wide dependency focused on individual files. We propose a framework, DepEx, aimed at discovering the detailed package relations at the level of individual binary files.
arXiv Detail & Related papers (2023-07-10T10:12:21Z)
Managed Geo-Distributed Feature Store: Architecture and System Design [1.1809647985607934]
Companies are using machine learning to solve real-world problems and are developing hundreds to thousands of features in the process. Without feature stores, different teams across various business groups would maintain the above process independently. This paper aims to capture the core architectural components that make up a managed feature store and to share the design learning in building such a system.
arXiv Detail & Related papers (2023-05-31T17:51:30Z)
Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIs [0.2538209532048866]
This article provides the motivation and overview of the Collective Knowledge framework (CK or cKnowledge) The CK concept is to decompose research projects into reusable components that encapsulate research artifacts. The long-term goal is to accelerate innovation by connecting researchers and practitioners to share and reuse all their knowledge.
arXiv Detail & Related papers (2020-11-02T17:42:59Z)
Integrated Benchmarking and Design for Reproducible and Accessible Evaluation of Robotic Agents [61.36681529571202]
We describe a new concept for reproducible robotics research that integrates development and benchmarking. One of the central components of this setup is the Duckietown Autolab, a standardized setup that is itself relatively low-cost and reproducible. We validate the system by analyzing the repeatability of experiments conducted using the infrastructure and show that there is low variance across different robot hardware and across different remote labs.
arXiv Detail & Related papers (2020-09-09T15:31:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.