Related papers: An ML-based Approach to Predicting Software Change Dependencies: Insights from an Empirical Study on OpenStack

An ML-based Approach to Predicting Software Change Dependencies: Insights from an Empirical Study on OpenStack

URL: http://arxiv.org/abs/2508.05034v1
Date: Thu, 07 Aug 2025 05:16:29 GMT
Title: An ML-based Approach to Predicting Software Change Dependencies: Insights from an Empirical Study on OpenStack
Authors: Arabat, Ali, Sayagh, Mohammed, Hassine, Jameleddine,
Abstract summary: In modern software systems, dependencies often span multiple components across teams, creating challenges for development and deployment.<n>We propose a semi-automated approach that leverages two ML models.<n>Our proposed models demonstrate strong performance, achieving average AUC scores of 79.33% and 91.89%, and Brier scores of 0.11 and 0.014, respectively.
Score: 0.41232474244672235
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As software systems grow in complexity, accurately identifying and managing dependencies among changes becomes increasingly critical. For instance, a change that leverages a function must depend on the change that introduces it. Establishing such dependencies allows CI/CD pipelines to build and orchestrate changes effectively, preventing build failures and incomplete feature deployments. In modern software systems, dependencies often span multiple components across teams, creating challenges for development and deployment. They serve various purposes, from enabling new features to managing configurations, and can even involve traditionally independent changes like documentation updates. To address these challenges, we conducted a preliminary study on dependency management in OpenStack, a large-scale software system. Our study revealed that a substantial portion of software changes in OpenStack over the past 10 years are interdependent. Surprisingly, 51.08% of these dependencies are identified during the code review phase-after a median delay of 5.06 hours-rather than at the time of change creation. Developers often spend a median of 57.12 hours identifying dependencies, searching among a median of 463 other changes. To help developers proactively identify dependencies, we propose a semi-automated approach that leverages two ML models. The first model predicts the likelihood of dependencies among changes, while the second identifies the exact pairs of dependent changes. Our proposed models demonstrate strong performance, achieving average AUC scores of 79.33% and 91.89%, and Brier scores of 0.11 and 0.014, respectively. Indeed, the second model has a good top-k recall across all types of pairs, while the top-k precision has room for improvement.

Related papers

Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
Enhancing Software Maintenance: A Learning to Rank Approach for Co-changed Method Identification [0.7285835869818668]
We propose a learning-to-rank approach that combines source code features and change history to predict and rank co-changed methods at the pull-request level.<n>Experiments on 150 open-source Java projects, totaling 41.5 million lines of code and 634,216 pull requests, show that the Random Forest model outperforms other models by 2.5 to 12.8 percent in NDCG@5.
arXiv Detail & Related papers (2024-11-28T12:23:02Z)
DeMuVGN: Effective Software Defect Prediction Model by Learning Multi-view Software Dependency via Graph Neural Networks [37.928355252723996]
DeMuVGN is a defect prediction model that learns multi-view software dependency via graph neural networks.<n>We introduce a Multi-view Software Dependency Graph that integrates data, call, and developer dependencies.<n>In a case study of eight open-source projects across 20 versions, DeMuVGN demonstrates significant improvements.
arXiv Detail & Related papers (2024-10-25T13:24:04Z)
See to Believe: Using Visualization To Motivate Updating Third-party Dependencies [1.7914660044009358]
Security vulnerabilities introduced by applications using third-party dependencies are on the increase. Developers are wary of library updates, even to fix vulnerabilities, citing that being unaware, or that the migration effort to update outweighs the decision. In this paper, we hypothesize that the dependency graph visualization (DGV) approach will motivate developers to update.
arXiv Detail & Related papers (2024-05-15T03:57:27Z)
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models [102.72940700598055]
In reasoning tasks, even a minor error can cascade into inaccurate results. We develop a method that avoids introducing external resources, relying instead on perturbations to the input. Our training approach randomly masks certain tokens within the chain of thought, a technique we found to be particularly effective for reasoning tasks.
arXiv Detail & Related papers (2024-03-04T16:21:54Z)
Analyzing the Evolution of Inter-package Dependencies in Operating Systems: A Case Study of Ubuntu [7.76541950830141]
An Operating System (OS) combines multiple interdependent software packages, which usually have their own independently developed architectures. For an evolutionary effort, designers/developers of OS can greatly benefit from fully understanding the system-wide dependency focused on individual files. We propose a framework, DepEx, aimed at discovering the detailed package relations at the level of individual binary files.
arXiv Detail & Related papers (2023-07-10T10:12:21Z)
Dependency Update Strategies and Package Characteristics [5.119787101452765]
This study explores the association between package characteristics and the dependency update strategy selected by its dependents. We study over 112,000 npm packages and use 19 characteristics to build a prediction model that identifies the common dependency update strategy for each package.
arXiv Detail & Related papers (2023-05-25T02:58:21Z)
DiffStack: A Differentiable and Modular Control Stack for Autonomous Vehicles [75.43355868143209]
We present DiffStack, a differentiable and modular stack for prediction, planning, and control. Our results on the nuScenes dataset indicate that end-to-end training with DiffStack yields substantial improvements in open-loop and closed-loop planning metrics.
arXiv Detail & Related papers (2022-12-13T09:05:21Z)
Deep learning model solves change point detection for multiple change types [69.77452691994712]
A change points detection aims to catch an abrupt disorder in data distribution. We propose an approach that works in the multiple-distributions scenario.
arXiv Detail & Related papers (2022-04-15T09:44:21Z)
Superiority of Simplicity: A Lightweight Model for Network Device Workload Prediction [58.98112070128482]
We propose a lightweight solution for series prediction based on historic observations. It consists of a heterogeneous ensemble method composed of two models - a neural network and a mean predictor. It achieves an overall $R2$ score of 0.10 on the available FedCSIS 2020 challenge dataset.
arXiv Detail & Related papers (2020-07-07T15:44:16Z)
Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks [91.65637773358347]
We propose a general graph neural network framework designed specifically for multivariate time series data. Our approach automatically extracts the uni-directed relations among variables through a graph learning module. Our proposed model outperforms the state-of-the-art baseline methods on 3 of 4 benchmark datasets.
arXiv Detail & Related papers (2020-05-24T04:02:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.