Predicting Maintenance Cessation of Open Source Software Repositories with An Integrated Feature Framework
- URL: http://arxiv.org/abs/2507.21678v1
- Date: Tue, 29 Jul 2025 10:45:24 GMT
- Title: Predicting Maintenance Cessation of Open Source Software Repositories with An Integrated Feature Framework
- Authors: Yiming Xu, Runzhi He, Hengzhi Ye, Minghui Zhou, Huaimin Wang,
- Abstract summary: Maintenance risks of open source software (OSS) projects pose significant threats to the quality, security, and resilience of modern software supply chains.<n>We introduce maintenance cessation'', grounded in explicit archival status and rigorous semantic analysis of project documentation.<n>We propose an integrated, multi-perspective feature framework for predicting maintenance cessation, systematically combining user-centric features, maintainer-centric features and project evolution features.
- Score: 14.346295005927347
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The maintenance risks of open source software (OSS) projects pose significant threats to the quality, security, and resilience of modern software supply chains. While prior research has proposed diverse approaches for predicting OSS maintenance risk -- leveraging signals ranging from surface features (e.g., stars, commits) to social network analyses and behavioral patterns -- existing methods often suffer from ambiguous operational definitions, limited interpretability, and datasets of insufficient scale or generalizability. In this work, we introduce ``maintenance cessation'', grounded in both explicit archival status and rigorous semantic analysis of project documentation. Building on this foundation, we curate a large-scale, longitudinal dataset of 115,466 GitHub repositories -- encompassing 57,733 confirmed cessation events -- complemented by comprehensive, timeline-based behavioral features. We propose an integrated, multi-perspective feature framework for predicting maintenance cessation, systematically combining user-centric features, maintainer-centric features and project evolution features. AFT survival analysis demonstrates a high C-index (0.846), substantially outperforming models relying only on surface features. Feature ablation and SHAP analysis further confirm the effectiveness and interpretability of our approach. Finally, we demonstrate real-world applicability by deploying a GBSA classifier in the openEuler ecosystem for proactive package risk screening. Our work establishes a scalable, interpretable foundation for maintenance-risk prediction, enabling reproducible risk management across large-scale open source ecosystems.
Related papers
- An Accurate and Efficient Vulnerability Propagation Analysis Framework [13.051314477680902]
We propose a novel approach to quantify the scope and evolution of vulnerability impacts in software supply chains.<n>We implement a prototype of our approach in the Java Maven ecosystem and evaluate it on 100 real-world vulnerabilities.
arXiv Detail & Related papers (2025-06-02T05:55:45Z) - Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z) - Reasoning with LLMs for Zero-Shot Vulnerability Detection [0.9208007322096533]
We present textbfVulnSage, a comprehensive evaluation framework and a curated dataset from diverse, large-scale open-source system software projects.<n>The framework supports multi-granular analysis across function, file, and inter-function levels.<n>It employs four diverse zero-shot prompt strategies: Baseline, Chain-of-context, Think, and Think & verify.
arXiv Detail & Related papers (2025-03-22T23:59:17Z) - SurvHive: a package to consistently access multiple survival-analysis packages [0.0]
SurvHive is a Python-based framework designed to unify survival analysis methods within a coherent and interface modeled on scikit-learn.<n>SurvHive integrates classical statistical models with cutting-edge deep learning approaches, including transformer-based architectures and parametric survival models.
arXiv Detail & Related papers (2025-02-04T11:02:40Z) - Trustworthiness in Retrieval-Augmented Generation Systems: A Survey [59.26328612791924]
Retrieval-Augmented Generation (RAG) has quickly grown into a pivotal paradigm in the development of Large Language Models (LLMs)
We propose a unified framework that assesses the trustworthiness of RAG systems across six key dimensions: factuality, robustness, fairness, transparency, accountability, and privacy.
arXiv Detail & Related papers (2024-09-16T09:06:44Z) - Revealing the value of Repository Centrality in lifespan prediction of Open Source Software Projects [5.438725298163702]
We propose a novel metric from the user-repository network, and leverage the metric to fit project deprecation predictors.
We establish a comprehensive dataset containing 103,354 non-fork GitHub OSS projects spanning from 2011 to 2023.
Our study reveals a correlation between the HITS centrality metrics and the repository deprecation risk.
arXiv Detail & Related papers (2024-05-13T07:07:54Z) - Profile of Vulnerability Remediations in Dependencies Using Graph
Analysis [40.35284812745255]
This research introduces graph analysis methods and a modified Graph Attention Convolutional Neural Network (GAT) model.
We analyze control flow graphs to profile breaking changes in applications occurring from dependency upgrades intended to remediate vulnerabilities.
Results demonstrate the effectiveness of the enhanced GAT model in offering nuanced insights into the relational dynamics of code vulnerabilities.
arXiv Detail & Related papers (2024-03-08T02:01:47Z) - Distributionally Robust Statistical Verification with Imprecise Neural Networks [3.9456691693452552]
A particularly challenging problem in AI safety is providing guarantees on the behavior of high-dimensional autonomous systems.<n>This paper proposes a novel approach based on uncertainty quantification using concepts from probabilities.<n>We show that our approach can provide useful and scalable guarantees for high-dimensional systems.
arXiv Detail & Related papers (2023-08-28T18:06:24Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive
Privacy Analysis and Beyond [57.10914865054868]
We consider vertical logistic regression (VLR) trained with mini-batch descent gradient.
We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks.
arXiv Detail & Related papers (2022-07-19T05:47:30Z) - Learning Output Embeddings in Structured Prediction [73.99064151691597]
A powerful and flexible approach to structured prediction consists in embedding the structured objects to be predicted into a feature space of possibly infinite dimension.
A prediction in the original space is computed by solving a pre-image problem.
In this work, we propose to jointly learn a finite approximation of the output embedding and the regression function into the new feature space.
arXiv Detail & Related papers (2020-07-29T09:32:53Z) - An Uncertainty-based Human-in-the-loop System for Industrial Tool Wear
Analysis [68.8204255655161]
We show that uncertainty measures based on Monte-Carlo dropout in the context of a human-in-the-loop system increase the system's transparency and performance.
A simulation study demonstrates that the uncertainty-based human-in-the-loop system increases performance for different levels of human involvement.
arXiv Detail & Related papers (2020-07-14T15:47:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.