Patching as Translation: the Data and the Metaphor
- URL: http://arxiv.org/abs/2008.10707v2
- Date: Tue, 1 Sep 2020 02:33:19 GMT
- Title: Patching as Translation: the Data and the Metaphor
- Authors: Yangruibo Ding, Baishakhi Ray, Premkumar Devanbu, Vincent J.
Hellendoorn
- Abstract summary: We show that "software patching is like language translation"
We show how a more principled approach to model design, based on our empirical findings and general knowledge of software development, can lead to better solutions.
We implement such models ourselves as "proof-of-concept" tools and empirically confirm that they behave in a fundamentally different, more effective way than the studied translation-based architectures.
- Score: 18.22949296398319
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine Learning models from other fields, like Computational Linguistics,
have been transplanted to Software Engineering tasks, often quite successfully.
Yet a transplanted model's initial success at a given task does not necessarily
mean it is well-suited for the task. In this work, we examine a common example
of this phenomenon: the conceit that "software patching is like language
translation". We demonstrate empirically that there are subtle, but critical
distinctions between sequence-to-sequence models and translation model: while
program repair benefits greatly from the former, general modeling architecture,
it actually suffers from design decisions built into the latter, both in terms
of translation accuracy and diversity. Given these findings, we demonstrate how
a more principled approach to model design, based on our empirical findings and
general knowledge of software development, can lead to better solutions. Our
findings also lend strong support to the recent trend towards synthesizing
edits of code conditional on the buggy context, to repair bugs. We implement
such models ourselves as "proof-of-concept" tools and empirically confirm that
they behave in a fundamentally different, more effective way than the studied
translation-based architectures. Overall, our results demonstrate the merit of
studying the intricacies of machine learned models in software engineering: not
only can this help elucidate potential issues that may be overshadowed by
increases in accuracy; it can also help innovate on these models to raise the
state-of-the-art further. We will publicly release our replication data and
materials at https://github.com/ARiSE-Lab/Patch-as-translation.
Related papers
- Pitfalls and Outlooks in Using COMET [22.016569792620295]
The COMET metric has blazed a trail in the machine translation community, given its strong correlation with human translation quality.
We investigate three aspects of the COMET metric: technical: obsolete software versions and compute precision; data: empty content, language mismatch, and translationese at test time; usage and reporting.
We release the sacreCOMET package that can generate a signature for the software and model configuration as well as an appropriate citation.
arXiv Detail & Related papers (2024-08-27T19:03:11Z) - Collaborative decoding of critical tokens for boosting factuality of
large language models [57.504894664689]
Finetuned and aligned models show improved abilities of instruction following and safe generation.
The common practice of using sampling during generation also increases chances of hallucination.
We introduce a collaborative decoding framework to harness the high factuality within pretrained models through the concept of critical tokens.
arXiv Detail & Related papers (2024-02-28T01:53:37Z) - Beyond Self-learned Attention: Mitigating Attention Bias in
Transformer-based Models Using Attention Guidance [9.486558126032639]
We introduce SyntaGuid, a novel approach to guide Transformer-based models towards critical source code tokens.
We show that SyntaGuid can improve overall performance up to 3.25% and fix up to 28.3% wrong predictions.
arXiv Detail & Related papers (2024-02-26T18:03:50Z) - Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity
Tracking [53.66999416757543]
We study how fine-tuning affects the internal mechanisms implemented in language models.
Fine-tuning enhances, rather than alters, the mechanistic operation of the model.
arXiv Detail & Related papers (2024-02-22T18:59:24Z) - A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained
Models [87.7086269902562]
We show that subword-based models might still be the most practical choice in many settings.
We encourage future work in tokenizer-free methods to consider these factors when designing and evaluating new models.
arXiv Detail & Related papers (2022-10-13T15:47:09Z) - BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization.
We provide a general benchmark with a diversity of real and synthetic Java bugs.
We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z) - Beyond the Imitation Game: Quantifying and extrapolating the
capabilities of language models [648.3665819567409]
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale.
Big-bench consists of 204 tasks, contributed by 450 authors across 132 institutions.
We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench.
arXiv Detail & Related papers (2022-06-09T17:05:34Z) - Super-Prompting: Utilizing Model-Independent Contextual Data to Reduce
Data Annotation Required in Visual Commonsense Tasks [3.42658286826597]
We analyze different prompt-based fine-tuning techniques to improve results on both language and multimodal causal transformer models.
Our results show that by simple model-agnostic prompt-based fine-tuning, comparable results can be reached by only using 35%-40% of the fine-tuning training dataset.
arXiv Detail & Related papers (2022-04-25T18:56:55Z) - Paraphrastic Representations at Scale [134.41025103489224]
We release trained models for English, Arabic, German, French, Spanish, Russian, Turkish, and Chinese languages.
We train these models on large amounts of data, achieving significantly improved performance from the original papers.
arXiv Detail & Related papers (2021-04-30T16:55:28Z) - On the comparability of Pre-trained Language Models [0.0]
Recent developments in unsupervised representation learning have successfully established the concept of transfer learning in NLP.
More elaborated architectures are making better use of contextual information.
Larger corpora are used as resources for pre-training large language models in a self-supervised fashion.
Advances in parallel computing as well as in cloud computing made it possible to train these models with growing capacities in the same or even in shorter time than previously established models.
arXiv Detail & Related papers (2020-01-03T10:53:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.