SoK: Towards Reproducibility for Software Packages in Scripting Language Ecosystems
- URL: http://arxiv.org/abs/2503.21705v1
- Date: Thu, 27 Mar 2025 17:10:38 GMT
- Title: SoK: Towards Reproducibility for Software Packages in Scripting Language Ecosystems
- Authors: Timo Pohl, Pavel Novák, Marc Ohm, Michael Meier,
- Abstract summary: This SoK provides an overview of existing research, aiming to highlight future directions.<n>We work out key aspects in current research, systematize identified challenges for software, and map them between the ecosystems.<n>We find that the literature is sparse, focusing on few individual problems and ecosystems.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The disconnect between distributed software artifacts and their supposed source code enables attackers to leverage the build process for inserting malicious functionality. Past research in this field focuses on compiled language ecosystems, mostly analysing Linux distribution packages. However, the popular scripting language ecosystems potentially face unique issues given the systematic difference in distributed artifacts. This SoK provides an overview of existing research, aiming to highlight future directions, as well as chances to transfer existing knowledge from compiled language ecosystems. To that end, we work out key aspects in current research, systematize identified challenges for software reproducibility, and map them between the ecosystems. We find that the literature is sparse, focusing on few individual problems and ecosystems. This allows us to effectively identify next steps to improve reproducibility in this field.
Related papers
- Tracking Down Software Cluster Bombs: A Current State Analysis of the Free/Libre and Open Source Software (FLOSS) Ecosystem [0.43981305860983705]
This study provides a summary of the current state of available FLOSS package repositories.<n>It addresses the challenge of identifying problematic areas within a software ecosystem.<n>The results indicate that while there are well-maintained projects within the FLOSS ecosystem, there are also high-impact projects that are susceptible to supply chain attacks.
arXiv Detail & Related papers (2025-02-12T08:57:57Z) - The Dynamics of Innovation in Open Source Software Ecosystems [0.8594140167290099]
New libraries emerge at a remarkably predictable sub-linear rate within ecosystems per post.
The most widely used libraries are used many times more often than the average.
New users are more likely to use new libraries and new combinations.
arXiv Detail & Related papers (2024-11-22T12:31:25Z) - ELCC: the Emergent Language Corpus Collection [1.6574413179773761]
The Emergent Language Corpus Collection (ELCC) is a collection of corpora generated from open source implementations of emergent communication systems.<n>Each corpus is annotated with metadata describing the characteristics of the source system as well as a suite of analyses of the corpus.
arXiv Detail & Related papers (2024-07-04T21:23:18Z) - TRIAD: Automated Traceability Recovery based on Biterm-enhanced
Deduction of Transitive Links among Artifacts [53.92293118080274]
Traceability allows stakeholders to extract and comprehend the trace links among software artifacts introduced across the software life cycle.
Most rely on textual similarities among software artifacts, such as those based on Information Retrieval (IR)
arXiv Detail & Related papers (2023-12-28T06:44:24Z) - Reverse-Engineering Decoding Strategies Given Blackbox Access to a
Language Generation System [73.52878118434147]
We present methods to reverse-engineer the decoding method used to generate text.
Our ability to discover which decoding strategy was used has implications for detecting generated text.
arXiv Detail & Related papers (2023-09-09T18:19:47Z) - An Introduction to Software Ecosystems [7.574742446357262]
This chapter defines and presents different kinds of software ecosystems.
The focus is on the development, tooling and analytics aspects of software ecosystems.
The chapter also introduces and clarifies the relevant terms needed to understand and analyse these ecosystems.
arXiv Detail & Related papers (2023-07-28T17:58:59Z) - Promises and Perils of Mining Software Package Ecosystem Data [10.787686237395816]
Third-party packages have led to the emergence of large software package ecosystems with a maze of inter-dependencies.
Understanding the infrastructure and dynamics of package ecosystems has given rise to approaches for better code reuse, automated updates, and the avoidance of vulnerabilities.
In this chapter, we review promises and perils of mining the rich data related to software package ecosystems available to software engineering researchers.
arXiv Detail & Related papers (2023-05-29T03:09:48Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - Addressing Issues of Cross-Linguality in Open-Retrieval Question
Answering Systems For Emergent Domains [67.99403521976058]
We demonstrate a cross-lingual open-retrieval question answering system for the emergent domain of COVID-19.
Our system adopts a corpus of scientific articles to ensure that retrieved documents are reliable.
We show that a deep semantic retriever greatly benefits from training on our English-to-all data and significantly outperforms a BM25 baseline in the cross-lingual setting.
arXiv Detail & Related papers (2022-01-26T19:27:32Z) - Cetacean Translation Initiative: a roadmap to deciphering the
communication of sperm whales [97.41394631426678]
Recent research showed the promise of machine learning tools for analyzing acoustic communication in nonhuman species.
We outline the key elements required for the collection and processing of massive bioacoustic data of sperm whales.
The technological capabilities developed are likely to yield cross-applications and advancements in broader communities investigating non-human communication and animal behavioral research.
arXiv Detail & Related papers (2021-04-17T18:39:22Z) - Intrinsic Probing through Dimension Selection [69.52439198455438]
Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks.
Such high performance should not be possible unless some form of linguistic structure inheres in these representations, and a wealth of research has sprung up on probing for it.
In this paper, we draw a distinction between intrinsic probing, which examines how linguistic information is structured within a representation, and the extrinsic probing popular in prior work, which only argues for the presence of such information by showing that it can be successfully extracted.
arXiv Detail & Related papers (2020-10-06T15:21:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.