Related papers: Evaluation of Version Control Merge Tools

Evaluation of Version Control Merge Tools

URL: http://arxiv.org/abs/2410.09934v1
Date: Sun, 13 Oct 2024 17:35:14 GMT
Title: Evaluation of Version Control Merge Tools
Authors: Benedikt Schesch, Ryan Featherman, Kenneth J. Yang, Ben R. Roberts, Michael D. Ernst,
Abstract summary: A version control system, such as Git, requires a way to integrate changes from different developers or branches. A merge tool either outputs a clean integration of the changes, or it outputs a conflict for manual resolution. New merge tools have been proposed, but they have not yet been evaluated against one another.
Score: 3.1969855247377836
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A version control system, such as Git, requires a way to integrate changes from different developers or branches. Given a merge scenario, a merge tool either outputs a clean integration of the changes, or it outputs a conflict for manual resolution. A clean integration is correct if it preserves intended program behavior, and is incorrect otherwise (e.g., if it causes a test failure). Manual resolution consumes valuable developer time, and correcting a defect introduced by an incorrect merge is even more costly. New merge tools have been proposed, but they have not yet been evaluated against one another. Prior evaluations do not properly distinguish between correct and incorrect merges, are not evaluated on a realistic set of merge scenarios, and/or do not compare to state-of-the-art tools. We have performed a more realistic evaluation. The results differ significantly from previous claims, setting the record straight and enabling better future research. Our novel experimental methodology combines running test suites, examining merges on deleted branches, and accounting for the cost of incorrect merges. Based on these evaluations, we created a merge tool that out-performs all previous tools under most assumptions. It handles the most common merge scenarios in practice.

Related papers

MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools [54.63478102768333]
Well-calibrated model confidences can be used to weigh the risk versus reward of potential actions. We propose a novel class of model-internal confidence estimators (MICE) to better assess confidence when calling tools.
arXiv Detail & Related papers (2025-04-28T18:06:38Z)
If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs [48.95875673503714]
We study merging "generalist" models trained on many tasks. Our algorithm tunes the weight of each checkpoint in a linear combination, resulting in an optimal model. Good merges tend to include almost all checkpoints with non-zero weights, indicating that even seemingly bad initial checkpoints can contribute to good final merges.
arXiv Detail & Related papers (2024-12-05T13:12:51Z)
WizardMerge -- Save Us From Merging Without Any Clues [8.21089093466603]
We present WizardMerge, an auxiliary tool that leverages merging results from Git to retrieve code block dependency on text and LLVM-IR level. The outcomes demonstrate that WizardMerge diminishes conflict merging time costs, achieving a 23.85% reduction.
arXiv Detail & Related papers (2024-07-03T05:40:29Z)
A Novel Refactoring and Semantic Aware Abstract Syntax Tree Differencing Tool and a Benchmark for Evaluating the Accuracy of Diff Tools [2.0625936401496237]
Syntax Tree (AST) diff tools were developed to overcome the limitations of line-based diff tools, which are used by the majority of developers. We propose a novel AST diff tool based on RefactoringMiner that resolves all aforementioned limitations. Our tool achieved a considerably higher precision and recall, especially for commits, with an execution time that is comparable with incompatible tools.
arXiv Detail & Related papers (2024-03-09T15:32:41Z)
Token Fusion: Bridging the Gap between Token Pruning and Token Merging [71.84591084401458]
Vision Transformers (ViTs) have emerged as powerful backbones in computer vision, outperforming many traditional CNNs. computational overhead, largely attributed to the self-attention mechanism, makes deployment on resource-constrained edge devices challenging. We introduce "Token Fusion" (ToFu), a method that amalgamates the benefits of both token pruning and token merging.
arXiv Detail & Related papers (2023-12-02T04:29:19Z)
Detecting Semantic Conflicts with Unit Tests [5.273883263686449]
Branching and merging are common practices in software development, increasing developer's productivity. Modern merge techniques can resolve textual conflicts automatically, but they fail when the conflict arises at the semantic level. We proposeSemAntic Merge, a semantic merge tool based on the automated generation of unit tests.
arXiv Detail & Related papers (2023-10-03T19:36:28Z)
Do code refactorings influence the merge effort? [80.1936417993664]
Multiple contributors frequently change the source code in parallel to implement new features, fix bugs, existing code, and make other changes. These simultaneous changes need to be merged into the same version of the source code. Studies show that 10 to 20 percent of all merge attempts result in conflicts, which require the manual developer's intervention to complete the process.
arXiv Detail & Related papers (2023-05-10T13:24:59Z)
Does Recommend-Revise Produce Reliable Annotations? An Analysis on Missing Instances in DocRED [60.39125850987604]
We show that a textit-revise scheme results in false negative samples and an obvious bias towards popular entities and relations. The relabeled dataset is released to serve as a more reliable test set of document RE models.
arXiv Detail & Related papers (2022-04-17T11:29:01Z)
Lerna: Transformer Architectures for Configuring Error Correction Tools for Short- and Long-Read Genome Sequencing [5.911600622951255]
We introduce Lerna for the automated configuration of k-mer-based EC tools. We show that the best k-mer value can vary for different datasets, even for the same EC tool. We also show that our attention-based models have significant runtime improvement for the entire pipeline.
arXiv Detail & Related papers (2021-12-19T05:59:26Z)
MergeBERT: Program Merge Conflict Resolution via Neural Transformers [11.460182185916704]
Merge conflicts can stall pull requests and continuous integration pipelines for hours to several days. We introduce MergeBERT, a novel neural program merge framework based on the token-level three-way differencing and a transformer model. Our model achieves 64--69% precision of merge resolution synthesis, yielding nearly a 2x performance improvement over existing structured and neural program merge tools.
arXiv Detail & Related papers (2021-08-31T21:37:53Z)
S3M: Siamese Stack (Trace) Similarity Measure [55.58269472099399]
We present S3M -- the first approach to computing stack trace similarity based on deep learning. It is based on a biLSTM encoder and a fully-connected classifier to compute similarity. Our experiments demonstrate the superiority of our approach over the state-of-the-art on both open-sourced data and a private JetBrains dataset.
arXiv Detail & Related papers (2021-03-18T21:10:41Z)
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift. We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness. The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
Fast(er) Reconstruction of Shredded Text Documents via Self-Supervised Deep Asymmetric Metric Learning [62.34197797857823]
A central problem in automatic reconstruction of shredded documents is the pairwise compatibility evaluation of the shreds. This work proposes a scalable deep learning approach for measuring pairwise compatibility in which the number of inferences scales linearly. Our method has accuracy comparable to the state-of-the-art with a speed-up of about 22 times for a test instance with 505 shreds.
arXiv Detail & Related papers (2020-03-23T03:22:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.