Related papers: PyTy: Repairing Static Type Errors in Python

PyTy: Repairing Static Type Errors in Python

URL: http://arxiv.org/abs/2401.06619v1
Date: Fri, 12 Jan 2024 15:08:56 GMT
Title: PyTy: Repairing Static Type Errors in Python
Authors: Yiu Wai Chow, Luca Di Grazia, Michael Pradel
Abstract summary: This paper presents PyTy, an automated program repair approach targeted at statically type errors in Python. We create a dataset of 2,766 error-fix pairs from 176 GitHub repositories, named PyTyDefects. Our evaluation shows that PyTy offers fixes for ten frequent categories of type errors, successfully addressing 85.4% of 281 real-world errors.
Score: 19.74043303068795
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gradual typing enables developers to annotate types of their own choosing, offering a flexible middle ground between no type annotations and a fully statically typed language. As more and more code bases get type-annotated, static type checkers detect an increasingly large number of type errors. Unfortunately, fixing these errors requires manual effort, hampering the adoption of gradual typing in practice. This paper presents PyTy, an automated program repair approach targeted at statically detectable type errors in Python. The problem of repairing type errors deserves specific attention because it exposes particular repair patterns, offers a warning message with hints about where and how to apply a fix, and because gradual type checking serves as an automatic way to validate fixes. We addresses this problem through three contributions: (i) an empirical study that investigates how developers fix Python type errors, showing a diverse set of fixing strategies with some recurring patterns; (ii) an approach to automatically extract type error fixes, which enables us to create a dataset of 2,766 error-fix pairs from 176 GitHub repositories, named PyTyDefects; (iii) the first learning-based repair technique for fixing type errors in Python. Motivated by the relative data scarcity of the problem, the neural model at the core of PyTy is trained via cross-lingual transfer learning. Our evaluation shows that PyTy offers fixes for ten frequent categories of type errors, successfully addressing 85.4% of 281 real-world errors. This effectiveness outperforms state-of-the-art large language models asked to repair type errors (by 2.1x) and complements a previous technique aimed at type errors that manifest at runtime. Finally, 20 out of 30 pull requests with PyTy-suggested fixes have been merged by developers, showing the usefulness of PyTy in practice.

Related papers

Tgea: An error-annotated dataset and benchmark tasks for text generation from pretrained language models [57.758735361535486]
TGEA is an error-annotated dataset for text generation from pretrained language models (PLMs) We create an error taxonomy to cover 24 types of errors occurring in PLM-generated sentences. This is the first dataset with comprehensive annotations for PLM-generated texts.
arXiv Detail & Related papers (2025-03-06T09:14:02Z)
LecPrompt: A Prompt-based Approach for Logical Error Correction with CodeBERT [28.711745671275477]
LecPrompt is a prompt-based approach to localize and repair logical errors. It harnesses the capabilities of CodeBERT, a transformer-based large language model trained on code. For Python, LecPrompt achieves a noteworthy 74.58% top-1 token-level repair accuracy. In Java, LecPrompt delivers a 69.23% top-1 token-level repair accuracy.
arXiv Detail & Related papers (2024-10-10T01:56:04Z)
A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back. Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair. This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z)
Generative Type Inference for Python [62.01560866916557]
This paper introduces TypeGen, a few-shot generative type inference approach that incorporates static domain knowledge from static analysis. TypeGen creates chain-of-thought (COT) prompts by translating the type inference steps of static analysis into prompts based on the type dependency graphs (TDGs) Experiments show that TypeGen outperforms the best baseline Type4Py by 10.0% for argument type prediction and 22.5% in return value type prediction in terms of top-1 Exact Match.
arXiv Detail & Related papers (2023-07-18T11:40:31Z)
Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Errors [41.87781274165405]
There exist rule-based approaches for automatically repairing Python type errors. The approaches can generate accurate patches but they require domain experts to design patch synthesis rules. In this paper, we present TypeFix, a novel prompt-based approach with fix templates incorporated for repairing Python type errors.
arXiv Detail & Related papers (2023-06-02T09:42:16Z)
TypeT5: Seq2seq Type Inference using Static Analysis [51.153089609654174]
We present a new type inference method that treats type prediction as a code infilling task. Our method uses static analysis to construct dynamic contexts for each code element whose type signature is to be predicted by the model. We also propose an iterative decoding scheme that incorporates previous type predictions in the model's input context.
arXiv Detail & Related papers (2023-03-16T23:48:00Z)
Towards Fine-Grained Information: Identifying the Type and Location of Translation Errors [80.22825549235556]
Existing approaches can not synchronously consider error position and type. We build an FG-TED model to predict the textbf addition and textbfomission errors. Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results.
arXiv Detail & Related papers (2023-02-17T16:20:33Z)
Repairing Bugs in Python Assignments Using Large Language Models [9.973714032271708]
We propose to use a large language model trained on code to build an APR system for programming assignments. Our system can fix both syntactic and semantic mistakes by combining multi-modal prompts, iterative querying, test-case-based selection of few-shots, and program chunking. We evaluate MMAPR on 286 real student programs and compare to a baseline built by combining a state-of-the-art Python syntax repair engine, BIFI, and state-of-the-art Python semantic repair engine for student assignments, Refactory.
arXiv Detail & Related papers (2022-09-29T15:41:17Z)
Is this Change the Answer to that Problem? Correlating Descriptions of Bug and Code Changes for Evaluating Patch Correctness [8.606215760860362]
We turn the patch correctness assessment into a Question Answering problem. We consider as inputs the bug reports as well as the natural language description of the generated patches. Experiments show that Quatrain can achieve an AUC of 0.886 on predicting patch correctness.
arXiv Detail & Related papers (2022-08-08T13:32:58Z)
Identifying non-natural language artifacts in bug reports [1.464410818828473]
We present a machine learning based approach to classify content into natural language and artifacts at line level in Python. We show how data from GitHub issue trackers can be used for automated training set generation. Our model scores at 0.95 ROC-AUC and 0.93 F1 against our manually annotated validation set, and classifies 10k lines in 0.72 seconds.
arXiv Detail & Related papers (2021-10-04T11:33:51Z)
Break-It-Fix-It: Unsupervised Learning for Program Repair [90.55497679266442]
We propose a new training approach, Break-It-Fix-It (BIFI), which has two key ideas. We use the critic to check a fixer's output on real bad inputs and add good (fixed) outputs to the training data. Based on these ideas, we iteratively update the breaker and the fixer while using them in conjunction to generate more paired data. BIFI outperforms existing methods, obtaining 90.5% repair accuracy on GitHub-Python and 71.7% on DeepFix.
arXiv Detail & Related papers (2021-06-11T20:31:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.