Source Code Archiving to the Rescue of Reproducible Deployment
- URL: http://arxiv.org/abs/2405.15516v1
- Date: Fri, 24 May 2024 13:00:28 GMT
- Title: Source Code Archiving to the Rescue of Reproducible Deployment
- Authors: Ludovic Courtès, Timothy Sample, Simon Tournier, Stefano Zacchiroli,
- Abstract summary: We describe our work connecting Guix with Software Heritage, the universal source code archive, making Guix the first free software distribution and tool backed by a stable archive.
Our contribution is twofold: we explain the rationale and present the design and implementation we came up with; second, we report on the archival coverage for package source code with data collected over five years and discuss remaining challenges.
- Score: 2.53740603524637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to verify research results and to experiment with methodologies are core tenets of science. As research results are increasingly the outcome of computational processes, software plays a central role. GNU Guix is a software deployment tool that supports reproducible software deployment, making it a foundation for computational research workflows. To achieve reproducibility, we must first ensure the source code of software packages Guix deploys remains available.We describe our work connecting Guix with Software Heritage, the universal source code archive, making Guix the first free software distribution and tool backed by a stable archive. Our contribution is twofold: we explain the rationale and present the design and implementation we came up with; second, we report on the archival coverage for package source code with data collected over five years and discuss remaining challenges.
Related papers
- RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph [63.87660059104077]
We present RepoGraph, a plug-in module that manages a repository-level structure for modern AI software engineering solutions.
RepoGraph substantially boosts the performance of all systems, leading to a new state-of-the-art among open-source frameworks.
arXiv Detail & Related papers (2024-10-03T05:45:26Z) - Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE)
We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories.
To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - TRIAD: Automated Traceability Recovery based on Biterm-enhanced
Deduction of Transitive Links among Artifacts [53.92293118080274]
Traceability allows stakeholders to extract and comprehend the trace links among software artifacts introduced across the software life cycle.
Most rely on textual similarities among software artifacts, such as those based on Information Retrieval (IR)
arXiv Detail & Related papers (2023-12-28T06:44:24Z) - The Software Heritage Open Science Ecosystem [0.0]
Software Heritage is the largest public archive of software source code and associated development history.
It has archived more than 16 billion unique source code files coming from more than 250 million collaborative development projects.
It supports empirical research on software by materializing in a single Merkle direct acyclic graph the development history of public code.
It ensures availability and guarantees integrity of the source code of software artifacts used in any field that relies on software to conduct experiments.
arXiv Detail & Related papers (2023-10-16T11:32:03Z) - FASER: Binary Code Similarity Search through the use of Intermediate
Representations [0.8594140167290099]
Cross-Architecture Binary Code Similarity Search has been explored in numerous studies.
We propose Function as a String Encoded Representation (FASER) to create a model capable of cross architecture function search.
arXiv Detail & Related papers (2023-10-05T15:36:35Z) - RepoCoder: Repository-Level Code Completion Through Iterative Retrieval
and Generation [96.75695811963242]
RepoCoder is a framework to streamline the repository-level code completion process.
It incorporates a similarity-based retriever and a pre-trained code language model.
It consistently outperforms the vanilla retrieval-augmented code completion approach.
arXiv Detail & Related papers (2023-03-22T13:54:46Z) - Code Recommendation for Open Source Software Developers [32.181023933552694]
CODER is a novel graph-based code recommendation framework for open source software developers.
Our framework achieves superior performance under various experimental settings, including intra-project, cross-project, and cold-start recommendation.
arXiv Detail & Related papers (2022-10-15T16:40:36Z) - Defining the role of open source software in research reproducibility [0.0]
I make a new proposal for the role of open source software.
I look for explanation of its success from the perspectives of connectivism.
I contend that engenders trust, which we routinely build in community via conversations.
arXiv Detail & Related papers (2022-04-26T19:52:47Z) - Underproduction: An Approach for Measuring Risk in Open Source Software [9.701036831490766]
'Underproduction' occurs when the supply of software engineering labor becomes out of alignment with the demand of people who rely on the software produced.
We present a conceptual framework for identifying relative underproduction in software as well as a statistical method for applying our framework to a comprehensive dataset.
arXiv Detail & Related papers (2021-02-27T23:18:21Z) - Nine Best Practices for Research Software Registries and Repositories: A
Concise Guide [63.52960372153386]
We present a set of nine best practices that can help managers define the scope, practices, and rules that govern individual registries and repositories.
These best practices were distilled from the experiences of the creators of existing resources, convened by a Task Force of the FORCE11 Software Implementation Working Group during the years 2011 and 2012.
arXiv Detail & Related papers (2020-12-24T05:37:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.