Related papers: Patching as Translation: the Data and the Metaphor

Patching as Translation: the Data and the Metaphor

URL: http://arxiv.org/abs/2008.10707v2
Date: Tue, 1 Sep 2020 02:33:19 GMT
Title: Patching as Translation: the Data and the Metaphor
Authors: Yangruibo Ding, Baishakhi Ray, Premkumar Devanbu, Vincent J. Hellendoorn
Abstract summary: We show that "software patching is like language translation" We show how a more principled approach to model design, based on our empirical findings and general knowledge of software development, can lead to better solutions. We implement such models ourselves as "proof-of-concept" tools and empirically confirm that they behave in a fundamentally different, more effective way than the studied translation-based architectures.
Score: 18.22949296398319
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine Learning models from other fields, like Computational Linguistics, have been transplanted to Software Engineering tasks, often quite successfully. Yet a transplanted model's initial success at a given task does not necessarily mean it is well-suited for the task. In this work, we examine a common example of this phenomenon: the conceit that "software patching is like language translation". We demonstrate empirically that there are subtle, but critical distinctions between sequence-to-sequence models and translation model: while program repair benefits greatly from the former, general modeling architecture, it actually suffers from design decisions built into the latter, both in terms of translation accuracy and diversity. Given these findings, we demonstrate how a more principled approach to model design, based on our empirical findings and general knowledge of software development, can lead to better solutions. Our findings also lend strong support to the recent trend towards synthesizing edits of code conditional on the buggy context, to repair bugs. We implement such models ourselves as "proof-of-concept" tools and empirically confirm that they behave in a fundamentally different, more effective way than the studied translation-based architectures. Overall, our results demonstrate the merit of studying the intricacies of machine learned models in software engineering: not only can this help elucidate potential issues that may be overshadowed by increases in accuracy; it can also help innovate on these models to raise the state-of-the-art further. We will publicly release our replication data and materials at https://github.com/ARiSE-Lab/Patch-as-translation.

Related papers

Looking beyond the next token [75.00751370502168]
We argue that rearranging and processing the training data sequences can allow models to more accurately imitate the true data-generating process. Our method naturally enables the generation of long-term goals at no additional cost.
arXiv Detail & Related papers (2025-04-15T16:09:06Z)
Pitfalls and Outlooks in Using COMET [22.016569792620295]
The COMET metric has blazed a trail in the machine translation community, given its strong correlation with human translation quality. We investigate three aspects of the COMET metric: technical: obsolete software versions and compute precision; data: empty content, language mismatch, and translationese at test time; usage and reporting. We release the sacreCOMET package that can generate a signature for the software and model configuration as well as an appropriate citation.
arXiv Detail & Related papers (2024-08-27T19:03:11Z)
Collaborative decoding of critical tokens for boosting factuality of large language models [57.504894664689]
Finetuned and aligned models show improved abilities of instruction following and safe generation. The common practice of using sampling during generation also increases chances of hallucination. We introduce a collaborative decoding framework to harness the high factuality within pretrained models through the concept of critical tokens.
arXiv Detail & Related papers (2024-02-28T01:53:37Z)
Beyond Self-learned Attention: Mitigating Attention Bias in Transformer-based Models Using Attention Guidance [9.486558126032639]
We introduce SyntaGuid, a novel approach to guide Transformer-based models towards critical source code tokens. We show that SyntaGuid can improve overall performance up to 3.25% and fix up to 28.3% wrong predictions.
arXiv Detail & Related papers (2024-02-26T18:03:50Z)
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking [53.66999416757543]
We study how fine-tuning affects the internal mechanisms implemented in language models. Fine-tuning enhances, rather than alters, the mechanistic operation of the model.
arXiv Detail & Related papers (2024-02-22T18:59:24Z)
A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained Models [87.7086269902562]
We show that subword-based models might still be the most practical choice in many settings. We encourage future work in tokenizer-free methods to consider these factors when designing and evaluating new models.
arXiv Detail & Related papers (2022-10-13T15:47:09Z)
BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization. We provide a general benchmark with a diversity of real and synthetic Java bugs. We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z)
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models [648.3665819567409]
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Big-bench consists of 204 tasks, contributed by 450 authors across 132 institutions. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench.
arXiv Detail & Related papers (2022-06-09T17:05:34Z)
Super-Prompting: Utilizing Model-Independent Contextual Data to Reduce Data Annotation Required in Visual Commonsense Tasks [3.42658286826597]
We analyze different prompt-based fine-tuning techniques to improve results on both language and multimodal causal transformer models. Our results show that by simple model-agnostic prompt-based fine-tuning, comparable results can be reached by only using 35%-40% of the fine-tuning training dataset.
arXiv Detail & Related papers (2022-04-25T18:56:55Z)
Paraphrastic Representations at Scale [134.41025103489224]
We release trained models for English, Arabic, German, French, Spanish, Russian, Turkish, and Chinese languages. We train these models on large amounts of data, achieving significantly improved performance from the original papers.
arXiv Detail & Related papers (2021-04-30T16:55:28Z)
On the comparability of Pre-trained Language Models [0.0]
Recent developments in unsupervised representation learning have successfully established the concept of transfer learning in NLP. More elaborated architectures are making better use of contextual information. Larger corpora are used as resources for pre-training large language models in a self-supervised fashion. Advances in parallel computing as well as in cloud computing made it possible to train these models with growing capacities in the same or even in shorter time than previously established models.
arXiv Detail & Related papers (2020-01-03T10:53:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.