Leveraging Data Mining Algorithms to Recommend Source Code Changes
- URL: http://arxiv.org/abs/2305.00323v1
- Date: Sat, 29 Apr 2023 18:38:23 GMT
- Title: Leveraging Data Mining Algorithms to Recommend Source Code Changes
- Authors: AmirHossein Naghshzan, Saeed Khalilazar, Pierre Poilane, Olga Baysal,
Latifa Guerrouj, Foutse Khomh
- Abstract summary: This paper proposes an automatic method for recommending source code changes using four data mining algorithms.
We compare the algorithms in terms of performance (Precision, Recall and F-measure) and execution time.
Apriori seems appropriate for large-scale projects, whereas Eclat appears to be suitable for small-scale projects.
- Score: 7.959841510571622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Context: Recent research has used data mining to develop techniques that can
guide developers through source code changes. To the best of our knowledge,
very few studies have investigated data mining techniques and--or compared
their results with other algorithms or a baseline. Objectives: This paper
proposes an automatic method for recommending source code changes using four
data mining algorithms. We not only use these algorithms to recommend source
code changes, but we also conduct an empirical evaluation. Methods: Our
investigation includes seven open-source projects from which we extracted
source change history at the file level. We used four widely data mining
algorithms \ie{} Apriori, FP-Growth, Eclat, and Relim to compare the algorithms
in terms of performance (Precision, Recall and F-measure) and execution time.
Results: Our findings provide empirical evidence that while some Frequent
Pattern Mining algorithms, such as Apriori may outperform other algorithms in
some cases, the results are not consistent throughout all the software
projects, which is more likely due to the nature and characteristics of the
studied projects, in particular their change history. Conclusion: Apriori seems
appropriate for large-scale projects, whereas Eclat appears to be suitable for
small-scale projects. Moreover, FP-Growth seems an efficient approach in terms
of execution time.
Related papers
- RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation [54.707460684650584]
Large Language Models (LLMs) demonstrate human-level capabilities in dialogue, reasoning, and knowledge retention.
Current research addresses this bottleneck by equipping LLMs with external knowledge, a technique known as Retrieval Augmented Generation (RAG)
RAGLAB is a modular and research-oriented open-source library that reproduces 6 existing algorithms and provides a comprehensive ecosystem for investigating RAG algorithms.
arXiv Detail & Related papers (2024-08-21T07:20:48Z) - Performance Evaluation and Comparison of a New Regression Algorithm [4.125187280299247]
We compare the performance of a newly proposed regression algorithm against four conventional machine learning algorithms.
The reader is free to replicate our results since we have provided the source code in a GitHub repository.
arXiv Detail & Related papers (2023-06-15T13:01:16Z) - Improving and Benchmarking Offline Reinforcement Learning Algorithms [87.67996706673674]
This work aims to bridge the gaps caused by low-level choices and datasets.
We empirically investigate 20 implementation choices using three representative algorithms.
We find two variants CRR+ and CQL+ achieving new state-of-the-art on D4RL.
arXiv Detail & Related papers (2023-06-01T17:58:46Z) - A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper.
Our dataset consists of 477 self-reported expertise scores provided by 58 researchers.
For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z) - Can We Do Better Than Random Start? The Power of Data Outsourcing [9.677679780556103]
Many organizations have access to abundant data but lack the computational power to process the data.
We propose simulation-based algorithms that can utilize a small amount of outsourced data to find good initial points.
arXiv Detail & Related papers (2022-05-17T05:34:36Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - A Pragmatic Look at Deep Imitation Learning [0.3626013617212666]
We re-implement 6 different adversarial imitation learning algorithms.
We evaluate them on a widely-used expert trajectory dataset.
GAIL consistently performs well across a range of sample sizes.
arXiv Detail & Related papers (2021-08-04T06:33:10Z) - Identifying Co-Adaptation of Algorithmic and Implementational
Innovations in Deep Reinforcement Learning: A Taxonomy and Case Study of
Inference-based Algorithms [15.338931971492288]
We focus on a series of inference-based actor-critic algorithms to decouple their algorithmic innovations and implementation decisions.
We identify substantial performance drops whenever implementation details are mismatched for algorithmic choices.
Results show which implementation details are co-adapted and co-evolved with algorithms.
arXiv Detail & Related papers (2021-03-31T17:55:20Z) - Towards Understanding the Behaviors of Optimal Deep Active Learning
Algorithms [19.65665942630067]
Active learning (AL) algorithms may achieve better performance with fewer data because the model guides the data selection process.
There is little study on what the optimal AL looks like, which would help researchers understand where their models fall short.
We present a simulated annealing algorithm to search for this optimal oracle and analyze it for several tasks.
arXiv Detail & Related papers (2020-12-29T22:56:42Z) - Discovering Reinforcement Learning Algorithms [53.72358280495428]
Reinforcement learning algorithms update an agent's parameters according to one of several possible rules.
This paper introduces a new meta-learning approach that discovers an entire update rule.
It includes both 'what to predict' (e.g. value functions) and 'how to learn from it' by interacting with a set of environments.
arXiv Detail & Related papers (2020-07-17T07:38:39Z) - Run2Survive: A Decision-theoretic Approach to Algorithm Selection based
on Survival Analysis [75.64261155172856]
survival analysis (SA) naturally supports censored data and offers appropriate ways to use such data for learning distributional models of algorithm runtime.
We leverage such models as a basis of a sophisticated decision-theoretic approach to algorithm selection, which we dub Run2Survive.
In an extensive experimental study with the standard benchmark ASlib, our approach is shown to be highly competitive and in many cases even superior to state-of-the-art AS approaches.
arXiv Detail & Related papers (2020-07-06T15:20:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.