Related papers: Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Errors

Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Errors

URL: http://arxiv.org/abs/2306.01394v1
Date: Fri, 2 Jun 2023 09:42:16 GMT
Title: Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Errors
Authors: Yun Peng, Shuzheng Gao, Cuiyun Gao, Yintong Huo, Michael R. Lyu
Abstract summary: There exist rule-based approaches for automatically repairing Python type errors. The approaches can generate accurate patches but they require domain experts to design patch synthesis rules. In this paper, we present TypeFix, a novel prompt-based approach with fix templates incorporated for repairing Python type errors.
Score: 41.87781274165405
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although the dynamic type system of Python facilitates the developers in writing Python programs, it also brings type errors at run-time. There exist rule-based approaches for automatically repairing Python type errors. The approaches can generate accurate patches but they require domain experts to design patch synthesis rules and suffer from low template coverage of real-world type errors. Learning-based approaches alleviate the manual efforts in designing patch synthesis rules. Among the learning-based approaches, the prompt-based approach which leverages the knowledge base of code pre-trained models via pre-defined prompts, obtains state-of-the-art performance in general program repair tasks. However, such prompts are manually defined and do not involve any specific clues for repairing Python type errors, resulting in limited effectiveness. How to automatically improve prompts with the domain knowledge for type error repair is challenging yet under-explored. In this paper, we present TypeFix, a novel prompt-based approach with fix templates incorporated for repairing Python type errors. TypeFix first mines generalized fix templates via a novel hierarchical clustering algorithm. The identified fix templates indicate the common edit patterns and contexts of existing type error fixes. TypeFix then generates code prompts for code pre-trained models by employing the generalized fix templates as domain knowledge, in which the masks are adaptively located for each type error instead of being pre-determined. Experiments on two benchmarks, including BugsInPy and TypeBugs, show that TypeFix successfully repairs 26 and 55 type errors, outperforming the best baseline approach by 9 and 14, respectively. Besides, the proposed fix template mining approach can cover 75% of developers' patches in both benchmarks, increasing the best rule-based approach PyTER by more than 30%.

Related papers

Type-Constrained Code Generation with Language Models [51.03439021895432]
Large language models (LLMs) produce uncompilable output because their next-token inference procedure does not model formal aspects of code. We introduce a type-constrained decoding approach that leverages type systems to guide code generation. Our approach reduces compilation errors by more than half and increases functional correctness in code synthesis, translation, and repair tasks.
arXiv Detail & Related papers (2025-04-12T15:03:00Z)
Tgea: An error-annotated dataset and benchmark tasks for text generation from pretrained language models [57.758735361535486]
TGEA is an error-annotated dataset for text generation from pretrained language models (PLMs) We create an error taxonomy to cover 24 types of errors occurring in PLM-generated sentences. This is the first dataset with comprehensive annotations for PLM-generated texts.
arXiv Detail & Related papers (2025-03-06T09:14:02Z)
LecPrompt: A Prompt-based Approach for Logical Error Correction with CodeBERT [28.711745671275477]
LecPrompt is a prompt-based approach to localize and repair logical errors. It harnesses the capabilities of CodeBERT, a transformer-based large language model trained on code. For Python, LecPrompt achieves a noteworthy 74.58% top-1 token-level repair accuracy. In Java, LecPrompt delivers a 69.23% top-1 token-level repair accuracy.
arXiv Detail & Related papers (2024-10-10T01:56:04Z)
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back. Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair. This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z)
PyTy: Repairing Static Type Errors in Python [19.74043303068795]
This paper presents PyTy, an automated program repair approach targeted at statically type errors in Python. We create a dataset of 2,766 error-fix pairs from 176 GitHub repositories, named PyTyDefects. Our evaluation shows that PyTy offers fixes for ten frequent categories of type errors, successfully addressing 85.4% of 281 real-world errors.
arXiv Detail & Related papers (2024-01-12T15:08:56Z)
Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning [90.13978453378768]
We introduce a comprehensive typology of factual errors in generated chart captions. A large-scale human annotation effort provides insight into the error patterns and frequencies in captions crafted by various chart captioning models. Our analysis reveals that even state-of-the-art models, including GPT-4V, frequently produce captions laced with factual inaccuracies.
arXiv Detail & Related papers (2023-12-15T19:16:21Z)
GAMMA: Revisiting Template-based Automated Program Repair via Mask Prediction [14.741742268621403]
Inappropriate donor code may cause plausible but incorrect patch generation even with correct fix patterns. In this paper, we propose GAMMA, to directly leverage large pre-trained language models for donor code generation. Results demonstrate that GAMMA correctly repairs 82 bugs on Defects4J-v1.2, which achieves 20.59% (14 bugs) and 26.15% (17 bugs) improvement over the previous state-of-the-art template-based approach TBar and learning-based one Recoder.
arXiv Detail & Related papers (2023-09-17T15:49:40Z)
RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs. We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z)
FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categories and Test Code Repair [0.5749787074942512]
Flaky tests are problematic because they non-deterministically pass or fail for the same software version under test. In this paper, we focus on predicting the type of fix that is required to remove flakiness and then repair the test code on that basis. One key idea is to guide the repair process with additional knowledge about the test's flakiness in the form of its predicted fix category.
arXiv Detail & Related papers (2023-06-21T19:34:16Z)
Repairing Bugs in Python Assignments Using Large Language Models [9.973714032271708]
We propose to use a large language model trained on code to build an APR system for programming assignments. Our system can fix both syntactic and semantic mistakes by combining multi-modal prompts, iterative querying, test-case-based selection of few-shots, and program chunking. We evaluate MMAPR on 286 real student programs and compare to a baseline built by combining a state-of-the-art Python syntax repair engine, BIFI, and state-of-the-art Python semantic repair engine for student assignments, Refactory.
arXiv Detail & Related papers (2022-09-29T15:41:17Z)
A Universal Error Measure for Input Predictions Applied to Online Graph Problems [57.58926849872494]
We introduce a novel measure for quantifying the error in input predictions. The measure captures errors due to absent predicted requests as well as unpredicted actual requests.
arXiv Detail & Related papers (2022-05-25T15:24:03Z)
Break-It-Fix-It: Unsupervised Learning for Program Repair [90.55497679266442]
We propose a new training approach, Break-It-Fix-It (BIFI), which has two key ideas. We use the critic to check a fixer's output on real bad inputs and add good (fixed) outputs to the training data. Based on these ideas, we iteratively update the breaker and the fixer while using them in conjunction to generate more paired data. BIFI outperforms existing methods, obtaining 90.5% repair accuracy on GitHub-Python and 71.7% on DeepFix.
arXiv Detail & Related papers (2021-06-11T20:31:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.