Team-related Features in Code Review Prediction Models
- URL: http://arxiv.org/abs/2312.06244v1
- Date: Mon, 11 Dec 2023 09:30:09 GMT
- Title: Team-related Features in Code Review Prediction Models
- Authors: Eduardo Witter and Ingrid Nunes and Dietmar Jannach
- Abstract summary: We evaluate the prediction power of features related to code ownership, workload, and team relationship.
Our results show that, individually, features related to code ownership have the best prediction power.
We conclude that all proposed features together with lines of code can make the best predictions for both reviewer participation and amount of feedback.
- Score: 10.576931077314887
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern Code Review (MCR) is an informal tool-assisted quality assurance
practice. It relies on the asynchronous communication among the authors of code
changes and reviewers, who are developers that provide feedback. However, from
candidate developers, some are able to provide better feedback than others
given a particular context. The selection of reviewers is thus an important
task, which can benefit from automated support. Many approaches have been
proposed in this direction, using for example data from code review
repositories to recommend reviewers. In this paper, we propose the use of
team-related features to improve the performance of predictions that are
helpful to build code reviewer recommenders, with our target predictions being
the identification of reviewers that would participate in a review and the
provided amount of feedback. We evaluate the prediction power of these
features, which are related to code ownership, workload, and team relationship.
This evaluation was done by carefully addressing challenges imposed by the MCR
domain, such as temporal aspects of the dataset and unbalanced classes.
Moreover, given that it is currently unknown how much past data is needed for
building MCR prediction models with acceptable performance, we explore the
amount of past data used to build prediction models. Our results show that,
individually, features related to code ownership have the best prediction
power. However, based on feature selection, we conclude that all proposed
features together with lines of code can make the best predictions for both
reviewer participation and amount of feedback. Regarding the amount of past
data, the timeframes of 3, 6, 9, and 12 months of data produce similar results.
Therefore, models can be trained considering short timeframes, thus reducing
the computational costs with negligible impact in the prediction performance
...
Related papers
- Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review.
A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods.
We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z) - Enhancing Mean-Reverting Time Series Prediction with Gaussian Processes:
Functional and Augmented Data Structures in Financial Forecasting [0.0]
We explore the application of Gaussian Processes (GPs) for predicting mean-reverting time series with an underlying structure.
GPs offer the potential to forecast not just the average prediction but the entire probability distribution over a future trajectory.
This is particularly beneficial in financial contexts, where accurate predictions alone may not suffice if incorrect volatility assessments lead to capital losses.
arXiv Detail & Related papers (2024-02-23T06:09:45Z) - What Makes a Code Review Useful to OpenDev Developers? An Empirical
Investigation [4.061135251278187]
Even a minor improvement in the effectiveness of Code Reviews can incur significant savings for a software development organization.
This study aims to develop a finer grain understanding of what makes a code review comment useful to OSS developers.
arXiv Detail & Related papers (2023-02-22T22:48:27Z) - VisFIS: Visual Feature Importance Supervision with
Right-for-the-Right-Reason Objectives [84.48039784446166]
We show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason metrics.
Our best performing method, Visual Feature Importance Supervision (VisFIS), outperforms strong baselines on benchmark VQA datasets.
Predictions are more accurate when explanations are plausible and faithful, and not when they are plausible but not faithful.
arXiv Detail & Related papers (2022-06-22T17:02:01Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Using Large-scale Heterogeneous Graph Representation Learning for Code
Review Recommendations [7.260832843615661]
We present CORAL, a novel approach to reviewer recommendation.
We use a socio-technical graph built from the rich set of entities.
We show that CORAL is able to model the manual history of reviewer selection remarkably well.
arXiv Detail & Related papers (2022-02-04T20:58:54Z) - Test-time Collective Prediction [73.74982509510961]
Multiple parties in machine learning want to jointly make predictions on future test points.
Agents wish to benefit from the collective expertise of the full set of agents, but may not be willing to release their data or model parameters.
We explore a decentralized mechanism to make collective predictions at test time, leveraging each agent's pre-trained model.
arXiv Detail & Related papers (2021-06-22T18:29:58Z) - Injecting Knowledge in Data-driven Vehicle Trajectory Predictors [82.91398970736391]
Vehicle trajectory prediction tasks have been commonly tackled from two perspectives: knowledge-driven or data-driven.
In this paper, we propose to learn a "Realistic Residual Block" (RRB) which effectively connects these two perspectives.
Our proposed method outputs realistic predictions by confining the residual range and taking into account its uncertainty.
arXiv Detail & Related papers (2021-03-08T16:03:09Z) - E-commerce Query-based Generation based on User Review [1.484852576248587]
We propose a novel seq2seq based text generation model to generate answers to user's question based on reviews posted by previous users.
Given a user question and/or target sentiment polarity, we extract aspects of interest and generate an answer that summarizes previous relevant user reviews.
arXiv Detail & Related papers (2020-11-11T04:58:31Z) - Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm.
Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function.
We develop an approach for representation learning in RL that sits in between these two extremes.
This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.