Semistructured Merge with Language-Specific Syntactic Separators
- URL: http://arxiv.org/abs/2407.18888v1
- Date: Fri, 26 Jul 2024 17:40:29 GMT
- Title: Semistructured Merge with Language-Specific Syntactic Separators
- Authors: Guilherme Cavalcanti, Paulo Borba, Leonardo dos Anjos, Jonatas Clementino,
- Abstract summary: We propose a tool that uses language-specific syntactic separators to infer structure without parsing.
Our tool shows significant improvements over unstructured tools widely used in practice.
- Score: 1.0999592665107416
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Structured merge tools exploit programming language syntactic structure to enhance merge accuracy by reducing spurious conflicts reported by unstructured tools. By creating and handling full ASTs, structured tools are language-specific and harder to implement. They can also be computationally expensive when merging large files.To reduce these drawbacks, semistructured merge tools work with partial ASTs that use strings to represent lower level syntactic structures such as method bodies, and rely on unstructured tools to merge them. This, however, results in merge accuracy loss. To improve accuracy without compromising semistructured merge benefits, we propose a tool that leverages language-specific syntactic separators to infer structure without parsing. We still resort to an unstructured tool to merge lower level structures, but only after preprocessing the code so that text in between separators such as curly braces appear in separate lines. This way we emulate the capabilities of structured merge tools while avoiding their drawbacks. By comparing our tool with a robust implementation of semistructured merge, we find that our tool substantially reduces the number of spurious conflicts. We also observe significant but less substantial reductions on the overall number of reported conflicts, and of files with conflicts. However, similar to structured tools, our tool lets more merge conflicts go undetected. Our tool shows significant improvements over unstructured tools widely used in practice. Finally we observe that exploiting language-specific syntactic separators introduces unique textual alignment challenges.
Related papers
- Concise and Precise Context Compression for Tool-Using Language Models [60.606281074373136]
We propose two strategies for compressing tool documentation into concise and precise summary sequences for tool-using language models.
Results on API-Bank and APIBench show that our approach reaches a performance comparable to the upper-bound baseline under up to 16x compression ratio.
arXiv Detail & Related papers (2024-07-02T08:17:00Z) - A Novel Refactoring and Semantic Aware Abstract Syntax Tree Differencing Tool and a Benchmark for Evaluating the Accuracy of Diff Tools [2.0625936401496237]
Syntax Tree (AST) diff tools were developed to overcome the limitations of line-based diff tools, which are used by the majority of developers.
We propose a novel AST diff tool based on RefactoringMiner that resolves all aforementioned limitations.
Our tool achieved a considerably higher precision and recall, especially for commits, with an execution time that is comparable with incompatible tools.
arXiv Detail & Related papers (2024-03-09T15:32:41Z) - Contrastive Instruction Tuning [61.97704869248903]
We propose Contrastive Instruction Tuning to maximize the similarity between semantically equivalent instruction-instance pairs.
Experiments on the PromptBench benchmark show that CoIN consistently improves LLMs' robustness to unseen instructions with variations across character, word, sentence, and semantic levels by an average of +2.5% in accuracy.
arXiv Detail & Related papers (2024-02-17T00:09:32Z) - Promptly Predicting Structures: The Return of Inference [31.442123334313035]
We present a framework for constructing zero- and few-shot linguistic structure predictors.
Our results show that enforcing consistency constructs not only structurally valid outputs, but also improves performance.
arXiv Detail & Related papers (2024-01-12T20:08:39Z) - Leveraging Code to Improve In-context Learning for Semantic Parsing [48.66031267718704]
In-context learning (ICL) is an appealing approach for semantic parsing due to its few-shot nature and improved generalization.
We improve the effectiveness of ICL for semantic parsing by (1) using general-purpose programming languages such as Python instead of DSLs, and (2) augmenting prompts with a structured domain description.
arXiv Detail & Related papers (2023-11-16T02:50:06Z) - ControlLLM: Augment Language Models with Tools by Searching on Graphs [97.62758830255002]
We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving real-world tasks.
Our framework comprises three key components: (1) a textittask decomposer that breaks down a complex task into clear subtasks with well-defined inputs and outputs; (2) a textitThoughts-on-Graph (ToG) paradigm that searches the optimal solution path on a pre-built tool graph; and (3) an textitexecution engine with a rich toolbox that interprets the solution path and runs the
arXiv Detail & Related papers (2023-10-26T21:57:21Z) - Topic Segmentation of Semi-Structured and Unstructured Conversational
Datasets using Language Models [3.7908886926768344]
Current works on topic segmentation often focus on segmentation of structured texts.
We propose Focal Loss function as a robust alternative to Cross-Entropy and re-weighted Cross-Entropy loss function when segmenting unstructured and semi-structured chats.
arXiv Detail & Related papers (2023-10-26T03:37:51Z) - Adapting Language Models to Compress Contexts [71.98287002918941]
Transformer-based language models (LMs) are powerful and widely-applicable tools, but their usefulness is constrained by a finite context window.
We propose to adapt pre-trained LMs into AutoCompressors, which are capable of compressing long contexts into compact summary vectors.
We fine-tune OPT and Llama-2 models on sequences of up to 30,720 tokens and show that AutoCompressors can utilize long contexts to improve perplexity.
arXiv Detail & Related papers (2023-05-24T06:42:44Z) - MergeBERT: Program Merge Conflict Resolution via Neural Transformers [11.460182185916704]
Merge conflicts can stall pull requests and continuous integration pipelines for hours to several days.
We introduce MergeBERT, a novel neural program merge framework based on the token-level three-way differencing and a transformer model.
Our model achieves 64--69% precision of merge resolution synthesis, yielding nearly a 2x performance improvement over existing structured and neural program merge tools.
arXiv Detail & Related papers (2021-08-31T21:37:53Z) - Unsupervised Distillation of Syntactic Information from Contextualized
Word Representations [62.230491683411536]
We tackle the task of unsupervised disentanglement between semantics and structure in neural language representations.
To this end, we automatically generate groups of sentences which are structurally similar but semantically different.
We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics.
arXiv Detail & Related papers (2020-10-11T15:13:18Z) - Retrofitting Structure-aware Transformer Language Model for End Tasks [34.74181162627023]
We consider retrofitting structure-aware Transformer language model for facilitating end tasks.
Middle-layer structural learning strategy is leveraged for structure integration.
Experimental results show that the retrofitted structure-aware Transformer language model achieves improved perplexity.
arXiv Detail & Related papers (2020-09-16T01:07:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.