Enhancing Software Maintenance: A Learning to Rank Approach for Co-changed Method Identification
- URL: http://arxiv.org/abs/2411.19099v1
- Date: Thu, 28 Nov 2024 12:23:02 GMT
- Title: Enhancing Software Maintenance: A Learning to Rank Approach for Co-changed Method Identification
- Authors: Yiping Jia, Safwat Hassan, Ying Zou,
- Abstract summary: We propose a learning-to-rank approach that combines source code features and change history to predict and rank co-changed methods at the pull-request level.<n>Experiments on 150 open-source Java projects, totaling 41.5 million lines of code and 634,216 pull requests, show that the Random Forest model outperforms other models by 2.5 to 12.8 percent in NDCG@5.
- Score: 0.7285835869818668
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the increasing complexity of large-scale software systems, identifying all necessary modifications for a specific change is challenging. Co-changed methods, which are methods frequently modified together, are crucial for understanding software dependencies. However, existing methods often produce large results with high false positives. Focusing on pull requests instead of individual commits provides a more comprehensive view of related changes, capturing essential co-change relationships. To address these challenges, we propose a learning-to-rank approach that combines source code features and change history to predict and rank co-changed methods at the pull-request level. Experiments on 150 open-source Java projects, totaling 41.5 million lines of code and 634,216 pull requests, show that the Random Forest model outperforms other models by 2.5 to 12.8 percent in NDCG@5. It also surpasses baselines such as file proximity, code clones, FCP2Vec, and StarCoder 2 by 4.7 to 537.5 percent. Models trained on longer historical data (90 to 180 days) perform consistently, while accuracy declines after 60 days, highlighting the need for bi-monthly retraining. This approach provides an effective tool for managing co-changed methods, enabling development teams to handle dependencies and maintain software quality.
Related papers
- An ML-based Approach to Predicting Software Change Dependencies: Insights from an Empirical Study on OpenStack [0.41232474244672235]
In modern software systems, dependencies often span multiple components across teams, creating challenges for development and deployment.<n>We propose a semi-automated approach that leverages two ML models.<n>Our proposed models demonstrate strong performance, achieving average AUC scores of 79.33% and 91.89%, and Brier scores of 0.11 and 0.014, respectively.
arXiv Detail & Related papers (2025-08-07T05:16:29Z) - LLM-Driven Collaborative Model for Untangling Commits via Explicit and Implicit Dependency Reasoning [20.147009997147798]
We propose ColaUntangle, a new collaborative consultation framework for commit untangling.<n>ColaUntangle integrates Large Language Model (LLM)-driven agents in a multi-agent architecture.<n>We construct multi-version Program Dependency Graphs (delta-PDG), enabling agents to reason over code relationships with both symbolic and semantic depth.
arXiv Detail & Related papers (2025-07-22T09:42:13Z) - CodeMorph: Mitigating Data Leakage in Large Language Model Assessment [6.27974411661361]
Concerns about benchmark leakage in large language models for code have raised issues of data contamination and inflated evaluation metrics.<n>We propose CodeMorph, an approach designed to support multiple programming languages while preserving cross-file dependencies to mitigate data leakage.
arXiv Detail & Related papers (2025-06-21T08:04:12Z) - Do Automatic Comment Generation Techniques Fall Short? Exploring the Influence of Method Dependencies on Code Understanding [1.971759811837406]
Method-level comments are critical for improving code comprehension and supporting software maintenance.
This study investigates the prevalence and impact of dependent methods in software projects and introduces a dependency-aware approach for method-level comment generation.
We propose HelpCOM, a novel dependency-aware technique that incorporates helper method information to improve comment clarity, comprehensiveness, and relevance.
arXiv Detail & Related papers (2025-04-28T03:49:06Z) - Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.
Our framework incorporates two complementary strategies: internal TTC and external TTC.
We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z) - Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models [102.72940700598055]
In reasoning tasks, even a minor error can cascade into inaccurate results.
We develop a method that avoids introducing external resources, relying instead on perturbations to the input.
Our training approach randomly masks certain tokens within the chain of thought, a technique we found to be particularly effective for reasoning tasks.
arXiv Detail & Related papers (2024-03-04T16:21:54Z) - DeepCode AI Fix: Fixing Security Vulnerabilities with Large Language
Models [3.1690235522182104]
Large language models (LLMs) are increasingly used to solve various programming tasks.
We show that the task is difficult as it requires the model to learn long-range code relationships.
We propose a technique to address these challenges with a new approach for querying and fine-tuning LLMs.
arXiv Detail & Related papers (2024-02-19T18:35:40Z) - Quantifying Process Quality: The Role of Effective Organizational
Learning in Software Evolution [0.0]
Real-world software applications must constantly evolve to remain relevant.
Traditional methods of software quality control involve software quality models and continuous code inspection tools.
However, there is a strong correlation and causation between the quality of the development process and the resulting software product.
arXiv Detail & Related papers (2023-05-29T12:57:14Z) - CCT5: A Code-Change-Oriented Pre-Trained Model [14.225942520238936]
We propose to pre-train a model specially designed for code changes to better support developers in software maintenance.
We first collect a large-scale dataset containing 1.5M+ pairwise data of code changes and commit messages.
We fine-tune the pre-trained model, CCT5, on three widely-labelled tasks incurred by code changes and two tasks specific to the code review process.
arXiv Detail & Related papers (2023-05-18T07:55:37Z) - NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision
Research [96.53307645791179]
We introduce the Never-Ending VIsual-classification Stream (NEVIS'22), a benchmark consisting of a stream of over 100 visual classification tasks.
Despite being limited to classification, the resulting stream has a rich diversity of tasks from OCR, to texture analysis, scene recognition, and so forth.
Overall, NEVIS'22 poses an unprecedented challenge for current sequential learning approaches due to the scale and diversity of tasks.
arXiv Detail & Related papers (2022-11-15T18:57:46Z) - Learning to Optimize Permutation Flow Shop Scheduling via Graph-based
Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems.
We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately.
Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z) - Reinforcement Learning for Branch-and-Bound Optimisation using
Retrospective Trajectories [72.15369769265398]
Machine learning has emerged as a promising paradigm for branching.
We propose retro branching; a simple yet effective approach to RL for branching.
We outperform the current state-of-the-art RL branching algorithm by 3-5x and come within 20% of the best IL method's performance on MILPs with 500 constraints and 1000 variables.
arXiv Detail & Related papers (2022-05-28T06:08:07Z) - iTAML: An Incremental Task-Agnostic Meta-learning Approach [123.10294801296926]
Humans can continuously learn new knowledge as their experience grows.
Previous learning in deep neural networks can quickly fade out when they are trained on a new task.
We introduce a novel meta-learning approach that seeks to maintain an equilibrium between all encountered tasks.
arXiv Detail & Related papers (2020-03-25T21:42:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.