LastMerge: A language-agnostic structured tool for code integration
- URL: http://arxiv.org/abs/2507.19687v1
- Date: Fri, 25 Jul 2025 21:46:10 GMT
- Title: LastMerge: A language-agnostic structured tool for code integration
- Authors: Joao Pedro Duarte, Paulo Borba, Guilherme Cavalcanti,
- Abstract summary: We propose LastMerge, a generic structured merge tool that can be configured through a thin interface.<n>We run an experiment with four structured merge tools: two Java specific tools, jDime and Spork, and their generic counterparts, respectively LastMerge and Mergiraf.<n>Our results show no evidence that generic structured merge significantly impacts merge accuracy.
- Score: 1.201626478128059
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unstructured line-based merge tools are widely used in practice. Structured AST-based merge tools show significantly improved merge accuracy, but are rarely used in practice because they are language specific and costly, consequently not being available for many programming languages. To improve merge accuracy for a wide range of languages, we propose LastMerge, a generic structured merge tool that can be configured through a thin interface that significantly reduces the effort of supporting structured merge. To understand the impact that generic structured merge might have on merge accuracy and performance, we run an experiment with four structured merge tools: two Java specific tools, jDime and Spork, and their generic counterparts, respectively LastMerge and Mergiraf. Using each tool, we replay merge scenarios from a significant dataset, and collect data on runtime, behavioral divergences, and merge accuracy. Our results show no evidence that generic structured merge significantly impacts merge accuracy. Although we observe a difference rate of approximately 10% between the Java specific tools and their generic counterparts, most of the differences stem from implementation details and could be avoided. We find that LastMerge reports 15% fewer false positives than jDime while Mergiraf misses 42% fewer false negatives than Spork. Both generic tools exhibit comparable runtime performance to the state of the art language specific implementations. These results suggest that generic structured merge tools can effectively replace language-specific ones, paving the way for broader adoption of structured merge in industry.
Related papers
- Analyzing and Evaluating the Behavior of Git Diff and Merge [0.0]
I document the main functionalities of Git: how diffs are computed, how they are used to run merges, and how merges enable more complex operations.<n>The default merge strategy (ort) can result in merges requiring exponential time in the number of commits in the history.<n>Sometimes when two sides of a merge add different lines at the same position, the result is not a conflict, but a merge containing both changes after each other, in arbitrary order.
arXiv Detail & Related papers (2025-07-16T13:01:03Z) - Evaluation of Version Control Merge Tools [3.1969855247377836]
A version control system, such as Git, requires a way to integrate changes from different developers or branches.
A merge tool either outputs a clean integration of the changes, or it outputs a conflict for manual resolution.
New merge tools have been proposed, but they have not yet been evaluated against one another.
arXiv Detail & Related papers (2024-10-13T17:35:14Z) - LLM$\ imes$MapReduce: Simplified Long-Sequence Processing using Large Language Models [73.13933847198395]
We propose a training-free framework for processing long texts, utilizing a divide-and-conquer strategy to achieve comprehensive document understanding.
The proposed LLM$times$MapReduce framework splits the entire document into several chunks for LLMs to read and then aggregates the intermediate answers to produce the final output.
arXiv Detail & Related papers (2024-10-12T03:13:44Z) - Semistructured Merge with Language-Specific Syntactic Separators [1.0999592665107416]
We propose a tool that uses language-specific syntactic separators to infer structure without parsing.
Our tool shows significant improvements over unstructured tools widely used in practice.
arXiv Detail & Related papers (2024-07-26T17:40:29Z) - A Novel Refactoring and Semantic Aware Abstract Syntax Tree Differencing Tool and a Benchmark for Evaluating the Accuracy of Diff Tools [2.0625936401496237]
Syntax Tree (AST) diff tools were developed to overcome the limitations of line-based diff tools, which are used by the majority of developers.
We propose a novel AST diff tool based on RefactoringMiner that resolves all aforementioned limitations.
Our tool achieved a considerably higher precision and recall, especially for commits, with an execution time that is comparable with incompatible tools.
arXiv Detail & Related papers (2024-03-09T15:32:41Z) - Leveraging Code to Improve In-context Learning for Semantic Parsing [48.66031267718704]
In-context learning (ICL) is an appealing approach for semantic parsing due to its few-shot nature and improved generalization.
We improve the effectiveness of ICL for semantic parsing by (1) using general-purpose programming languages such as Python instead of DSLs, and (2) augmenting prompts with a structured domain description.
arXiv Detail & Related papers (2023-11-16T02:50:06Z) - ControlLLM: Augment Language Models with Tools by Searching on Graphs [97.62758830255002]
We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving real-world tasks.
Our framework comprises three key components: (1) a textittask decomposer that breaks down a complex task into clear subtasks with well-defined inputs and outputs; (2) a textitThoughts-on-Graph (ToG) paradigm that searches the optimal solution path on a pre-built tool graph; and (3) an textitexecution engine with a rich toolbox that interprets the solution path and runs the
arXiv Detail & Related papers (2023-10-26T21:57:21Z) - Do code refactorings influence the merge effort? [80.1936417993664]
Multiple contributors frequently change the source code in parallel to implement new features, fix bugs, existing code, and make other changes.
These simultaneous changes need to be merged into the same version of the source code.
Studies show that 10 to 20 percent of all merge attempts result in conflicts, which require the manual developer's intervention to complete the process.
arXiv Detail & Related papers (2023-05-10T13:24:59Z) - MergeBERT: Program Merge Conflict Resolution via Neural Transformers [11.460182185916704]
Merge conflicts can stall pull requests and continuous integration pipelines for hours to several days.
We introduce MergeBERT, a novel neural program merge framework based on the token-level three-way differencing and a transformer model.
Our model achieves 64--69% precision of merge resolution synthesis, yielding nearly a 2x performance improvement over existing structured and neural program merge tools.
arXiv Detail & Related papers (2021-08-31T21:37:53Z) - Multilingual Autoregressive Entity Linking [49.35994386221958]
mGENRE is a sequence-to-sequence system for the Multilingual Entity Linking problem.
For a mention in a given language, mGENRE predicts the name of the target entity left-to-right, token-by-token.
We show the efficacy of our approach through extensive evaluation including experiments on three popular MEL benchmarks.
arXiv Detail & Related papers (2021-03-23T13:25:55Z) - GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and
Event Extraction [107.8262586956778]
We introduce graph convolutional networks (GCNs) with universal dependency parses to learn language-agnostic sentence representations.
GCNs struggle to model words with long-range dependencies or are not directly connected in the dependency tree.
We propose to utilize the self-attention mechanism to learn the dependencies between words with different syntactic distances.
arXiv Detail & Related papers (2020-10-06T20:30:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.