Evaluating Pre-trained Language Models for Repairing API Misuses
- URL: http://arxiv.org/abs/2310.16390v1
- Date: Wed, 25 Oct 2023 06:10:22 GMT
- Title: Evaluating Pre-trained Language Models for Repairing API Misuses
- Authors: Ting Zhang and Ivana Clairine Irsan and Ferdian Thung and David Lo and
Asankhaya Sharma and Lingxiao Jiang
- Abstract summary: API misuses often lead to software bugs, crashes, and vulnerabilities.
In a recent study, test-suite-based automatic program repair (APR) tools were found to be ineffective in repairing API misuses.
We conduct a comprehensive empirical study on 11 learning-aided APR tools, which include 9 of the state-of-the-art general-purpose PLMs and two APR tools.
Our results show that PLMs perform better than the studied APR tools in repairing API misuses.
- Score: 15.17607624946389
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: API misuses often lead to software bugs, crashes, and vulnerabilities. While
several API misuse detectors have been proposed, there are no automatic repair
tools specifically designed for this purpose. In a recent study,
test-suite-based automatic program repair (APR) tools were found to be
ineffective in repairing API misuses. Still, since the study focused on
non-learning-aided APR tools, it remains unknown whether learning-aided APR
tools are capable of fixing API misuses. In recent years, pre-trained language
models (PLMs) have succeeded greatly in many natural language processing tasks.
There is a rising interest in applying PLMs to APR. However, there has not been
any study that investigates the effectiveness of PLMs in repairing API misuse.
To fill this gap, we conduct a comprehensive empirical study on 11
learning-aided APR tools, which include 9 of the state-of-the-art
general-purpose PLMs and two APR tools. We evaluate these models with an
API-misuse repair dataset, consisting of two variants. Our results show that
PLMs perform better than the studied APR tools in repairing API misuses. Among
the 9 pre-trained models tested, CodeT5 is the best performer in the exact
match. We also offer insights and potential exploration directions for future
research.
Related papers
- Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation [73.9145653659403]
We show that Generative Error Correction models struggle to generalize beyond the specific types of errors encountered during training.
We propose DARAG, a novel approach designed to improve GEC for ASR in in-domain (ID) and OOD scenarios.
Our approach is simple, scalable, and both domain- and language-agnostic.
arXiv Detail & Related papers (2024-10-17T04:00:29Z) - Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis [12.7034916462208]
Automated Program Repair (APR) has garnered significant attention due to its potential to streamline the bug repair process for human developers.
This paper introduces an innovative APR approach called GIANTREPAIR.
Based on this insight, GIANTREPAIR first constructs patch skeletons from LLM-generated patches to confine the patch space, and then generates high-quality patches tailored to specific programs.
arXiv Detail & Related papers (2024-06-03T05:05:12Z) - Automated Program Repair: Emerging trends pose and expose problems for benchmarks [7.437224586066947]
Large language models (LLMs) are used to generate software patches.
Evaluations and comparisons must take care to ensure that results are valid and likely to generalize.
This is especially true for LLMs, whose large and often poorly-disclosed training datasets may include problems on which they are evaluated.
arXiv Detail & Related papers (2024-05-08T23:09:43Z) - Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models [9.454475517867817]
We propose a patch-naturalness measurement, entropy-delta, to improve the efficiency of template-based repair techniques.
Our proposed method can rank correct patches more effectively than state-of-the-art machine learning tools.
arXiv Detail & Related papers (2024-04-23T17:12:45Z) - A Novel Approach for Automatic Program Repair using Round-Trip
Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back.
Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair.
This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world
APIs [104.37772295581088]
Open-source large language models (LLMs), e.g., LLaMA, remain significantly limited in tool-use capabilities.
We introduce ToolLLM, a general tool-usetuning encompassing data construction, model training, and evaluation.
We first present ToolBench, an instruction-tuning framework for tool use, which is constructed automatically using ChatGPT.
arXiv Detail & Related papers (2023-07-31T15:56:53Z) - API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs [84.45284695156771]
API-Bank is a groundbreaking benchmark for tool-augmented Large Language Models.
We develop a run evaluation system consisting of 73 API tools.
We construct a comprehensive training set containing 1,888 tool-use dialogues from 2,138 APIs spanning 1,000 distinct domains.
arXiv Detail & Related papers (2023-04-14T14:05:32Z) - Revisiting the Plastic Surgery Hypothesis via Large Language Models [9.904030364454563]
We propose FitRepair, which combines the direct usage of Large Language Models with two domain-specific fine-tuning strategies and one prompting strategy for more powerful APR.
Our experiments on the widely studied Defects4j 1.2 and 2.0 datasets show that FitRepair fixes 89 and 44 bugs.
arXiv Detail & Related papers (2023-03-18T20:33:46Z) - Improving Automated Program Repair with Domain Adaptation [0.0]
Automated Program Repair (APR) is defined as the process of fixing a bug/defect in the source code, by an automated tool.
APR tools have recently experienced promising results by leveraging state-of-the-art Neural Language Processing (NLP) techniques.
arXiv Detail & Related papers (2022-12-21T23:52:09Z) - BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization.
We provide a general benchmark with a diversity of real and synthetic Java bugs.
We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.