Generative Type Inference for Python
- URL: http://arxiv.org/abs/2307.09163v1
- Date: Tue, 18 Jul 2023 11:40:31 GMT
- Title: Generative Type Inference for Python
- Authors: Yun Peng, Chaozheng Wang, Wenxuan Wang, Cuiyun Gao, Michael R. Lyu
- Abstract summary: This paper introduces TypeGen, a few-shot generative type inference approach that incorporates static domain knowledge from static analysis.
TypeGen creates chain-of-thought (COT) prompts by translating the type inference steps of static analysis into prompts based on the type dependency graphs (TDGs)
Experiments show that TypeGen outperforms the best baseline Type4Py by 10.0% for argument type prediction and 22.5% in return value type prediction in terms of top-1 Exact Match.
- Score: 62.01560866916557
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Python is a popular dynamic programming language, evidenced by its ranking as
the second most commonly used language on GitHub. However, its dynamic type
system can lead to potential type errors, leading researchers to explore
automatic type inference approaches for Python programs. The rule-based type
inference approaches can ensure the accuracy of predicted variable types, but
they suffer from low coverage problems. Supervised type inference approaches,
while feature-agnostic, require large, high-quality annotated datasets and are
limited to pre-defined types. As zero-shot approaches, the cloze-style
approaches reformulate the type inference problem into a fill-in-the-blank
problem. However, their performance is limited.
This paper introduces TypeGen, a few-shot generative type inference approach
that incorporates static domain knowledge from static analysis. TypeGen creates
chain-of-thought (COT) prompts by translating the type inference steps of
static analysis into prompts based on the type dependency graphs (TDGs),
enabling language models to learn from how static analysis infers types. By
combining COT prompts with code slices and type hints, TypeGen constructs
example prompts from human annotations. TypeGen only requires very few
annotated examples to teach language models to generate similar COT prompts via
in-context learning. Moreover, TypeGen enhances the interpretability of results
through the use of the input-explanation-output strategy. Experiments show that
TypeGen outperforms the best baseline Type4Py by 10.0% for argument type
prediction and 22.5% in return value type prediction in terms of top-1 Exact
Match by using only five examples. Furthermore, TypeGen achieves substantial
improvements of 27% to 84% compared to the zero-shot performance of large
language models with parameter sizes ranging from 1.3B to 175B in terms of
top-1 Exact Match.
Related papers
- Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - Type Prediction With Program Decomposition and Fill-in-the-Type Training [2.7998963147546143]
We build OpenTau, a search-based approach for type prediction that leverages large language models.
We evaluate our work with a new dataset for TypeScript type prediction, and show that 47.4% of files type check (14.5% absolute improvement) with an overall rate of 3.3 type errors per file.
arXiv Detail & Related papers (2023-05-25T21:16:09Z) - TypeT5: Seq2seq Type Inference using Static Analysis [51.153089609654174]
We present a new type inference method that treats type prediction as a code infilling task.
Our method uses static analysis to construct dynamic contexts for each code element whose type signature is to be predicted by the model.
We also propose an iterative decoding scheme that incorporates previous type predictions in the model's input context.
arXiv Detail & Related papers (2023-03-16T23:48:00Z) - Type4Py: Deep Similarity Learning-Based Type Inference for Python [9.956021565144662]
We present Type4Py, a deep similarity learning-based type inference model for Python.
We design a hierarchical neural network model that learns to discriminate between types of the same kind and dissimilar types in a high-dimensional space.
Considering the Top-1 prediction, Type4Py obtains 19.33% and 13.49% higher precision than Typilus and TypeWriter, respectively.
arXiv Detail & Related papers (2021-01-12T13:32:53Z) - Unsupervised Label-aware Event Trigger and Argument Classification [73.86358632937372]
We propose an unsupervised event extraction pipeline, which first identifies events with available tools (e.g., SRL) and then automatically maps them to pre-defined event types.
We leverage pre-trained language models to contextually represent pre-defined types for both event triggers and arguments.
We successfully map 83% of the triggers and 54% of the arguments to the correct types, almost doubling the performance of previous zero-shot approaches.
arXiv Detail & Related papers (2020-12-30T17:47:24Z) - Advanced Graph-Based Deep Learning for Probabilistic Type Inference [0.8508198765617194]
We introduce a range of graph neural network (GNN) models that operate on a novel type flow graph (TFG) representation.
Our GNN models are trained to predict the type labels in the TFG for a given input program.
We show that our best two GNN configurations for accuracy achieve a top-1 accuracy of 87.76% and 86.89% respectively.
arXiv Detail & Related papers (2020-09-13T08:13:01Z) - Learning Sparse Prototypes for Text Generation [120.38555855991562]
Prototype-driven text generation is inefficient at test time as a result of needing to store and index the entire training corpus.
We propose a novel generative model that automatically learns a sparse prototype support set that achieves strong language modeling performance.
In experiments, our model outperforms previous prototype-driven language models while achieving up to a 1000x memory reduction.
arXiv Detail & Related papers (2020-06-29T19:41:26Z) - LambdaNet: Probabilistic Type Inference using Graph Neural Networks [46.66093127573704]
This paper proposes a probabilistic type inference scheme for TypeScript based on a graph neural network.
Our approach can predict both standard types, like number or string, as well as user-defined types that have not been encountered during training.
arXiv Detail & Related papers (2020-04-29T17:48:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.