Why Authors and Maintainers Link (or Don't Link) Their PyPI Libraries to Code Repositories and Donation Platforms
- URL: http://arxiv.org/abs/2601.15139v1
- Date: Wed, 21 Jan 2026 16:13:57 GMT
- Title: Why Authors and Maintainers Link (or Don't Link) Their PyPI Libraries to Code Repositories and Donation Platforms
- Authors: Alexandros Tsakpinis, Nicolas Raube, Alexander Pretschner,
- Abstract summary: Metadata of libraries on the Python Package Index (PyPI) plays a critical role in supporting the transparency, trust, and sustainability of open-source libraries.<n>This paper presents a large-scale empirical study combining two targeted surveys sent to 50,000 PyPI authors and maintainers.<n>We analyze more than 1,400 responses using large language model (LLM)-based topic modeling to uncover key motivations and barriers related to linking repositories and donation platforms.
- Score: 83.16077040470975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Metadata of libraries on the Python Package Index (PyPI)-including links to source code repositories and donation platforms-plays a critical role in supporting the transparency, trust, and sustainability of open-source libraries. Yet, many packages lack such metadata, and little is known about the underlying reasons. This paper presents a large-scale empirical study combining two targeted surveys sent to 50,000 PyPI authors and maintainers. We analyze more than 1,400 responses using large language model (LLM)-based topic modeling to uncover key motivations and barriers related to linking repositories and donation platforms. While repository URLs are often linked to foster collaboration, increase transparency, and enable issue tracking, some maintainers omit them due to oversight, laziness, or the perceived irrelevance to their project. Donation platform links are reported to support open source work or receive financial contributions, but are hindered by skepticism, technical friction, and organizational constraints. Cross-cutting challenges-such as outdated links, lack of awareness, and unclear guidance-affect both types of metadata. We further assess the robustness of our topic modeling pipeline across 30 runs (84% lexical and 89% semantic similarity) and validate topic quality with 23 expert raters (Randolph's kappa = 0.55). The study contributes empirical insights into PyPI's metadata practices and provides recommendations for improving them, while also demonstrating the effectiveness of our topic modeling approach for analyzing short-text survey responses.
Related papers
- AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research [85.51475655916026]
AgentCPM-Report is a lightweight yet high-performing local solution composed of a framework that mirrors the human writing process.<n>Our framework uses a Writing As Reasoning Policy (WARP), which enables models to dynamically revise outlines.<n>Experiments on DeepResearch Bench, DeepConsult, and DeepResearch Gym demonstrate that AgentCPM-Report outperforms leading closed-source systems.
arXiv Detail & Related papers (2026-02-06T09:45:04Z) - Analyzing the Availability of E-Mail Addresses for PyPI Libraries [89.21869606965578]
81.6% of libraries include at least one valid e-mail address, with PyPI serving as the primary source.<n>We identify over 698,000 invalid entries, primarily due to missing fields.
arXiv Detail & Related papers (2026-01-20T14:54:58Z) - What About Our Bug? A Study on the Responsiveness of NPM Package Maintainers [2.131643283600185]
We investigate the responsiveness of 30,340 bug reports across 500 of the most depended-upon npm packages.<n>Our findings show that maintainers are generally responsive, with a median project-level responsiveness of 70%.
arXiv Detail & Related papers (2025-11-07T05:11:47Z) - Empirical Evaluation of AI-Assisted Software Package Selection: A Knowledge Graph Approach [4.100870096741918]
This study formulates software package selection as a Multi-Criteria Decision-Making (MCDM) problem.<n>Data pipelines continuously collect and integrate software metadata, usage trends, vulnerability information, and developer sentiment.<n>System uses large language models to interpret user intent and query the model to identify contextually appropriate packages.
arXiv Detail & Related papers (2025-08-06T13:55:43Z) - Analyzing the Usage of Donation Platforms for PyPI Libraries [91.97201077607862]
This study analyzes the adoption of donation platforms in the PyPI ecosystem.<n> GitHub Sponsors is the dominant platform, though many PyPI-listed links are outdated.
arXiv Detail & Related papers (2025-03-11T10:27:31Z) - See to Believe: Using Visualization To Motivate Updating Third-party Dependencies [1.7914660044009358]
Security vulnerabilities introduced by applications using third-party dependencies are on the increase.
Developers are wary of library updates, even to fix vulnerabilities, citing that being unaware, or that the migration effort to update outweighs the decision.
In this paper, we hypothesize that the dependency graph visualization (DGV) approach will motivate developers to update.
arXiv Detail & Related papers (2024-05-15T03:57:27Z) - pyvene: A Library for Understanding and Improving PyTorch Models via
Interventions [79.72930339711478]
$textbfpyvene$ is an open-source library that supports customizable interventions on a range of different PyTorch modules.
We show how $textbfpyvene$ provides a unified framework for performing interventions on neural models and sharing the intervened upon models with others.
arXiv Detail & Related papers (2024-03-12T16:46:54Z) - NLPeer: A Unified Resource for the Computational Study of Peer Review [58.71736531356398]
We introduce NLPeer -- the first ethically sourced multidomain corpus of more than 5k papers and 11k review reports from five different venues.
We augment previous peer review datasets to include parsed and structured paper representations, rich metadata and versioning information.
Our work paves the path towards systematic, multi-faceted, evidence-based study of peer review in NLP and beyond.
arXiv Detail & Related papers (2022-11-12T12:29:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.