Type Prediction With Program Decomposition and Fill-in-the-Type Training
- URL: http://arxiv.org/abs/2305.17145v1
- Date: Thu, 25 May 2023 21:16:09 GMT
- Title: Type Prediction With Program Decomposition and Fill-in-the-Type Training
- Authors: Federico Cassano, Ming-Ho Yee, Noah Shinn, Arjun Guha, Steven Holtzen
- Abstract summary: We build OpenTau, a search-based approach for type prediction that leverages large language models.
We evaluate our work with a new dataset for TypeScript type prediction, and show that 47.4% of files type check (14.5% absolute improvement) with an overall rate of 3.3 type errors per file.
- Score: 2.7998963147546143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: TypeScript and Python are two programming languages that support optional
type annotations, which are useful but tedious to introduce and maintain. This
has motivated automated type prediction: given an untyped program, produce a
well-typed output program. Large language models (LLMs) are promising for type
prediction, but there are challenges: fill-in-the-middle performs poorly,
programs may not fit into the context window, generated types may not type
check, and it is difficult to measure how well-typed the output program is. We
address these challenges by building OpenTau, a search-based approach for type
prediction that leverages large language models. We propose a new metric for
type prediction quality, give a tree-based program decomposition that searches
a space of generated types, and present fill-in-the-type fine-tuning for LLMs.
We evaluate our work with a new dataset for TypeScript type prediction, and
show that 47.4% of files type check (14.5% absolute improvement) with an
overall rate of 3.3 type errors per file. All code, data, and models are
available at: https://github.com/GammaTauAI/opentau.
Related papers
- Learning Program Behavioral Models from Synthesized Input-Output Pairs [70.9524884086882]
We introduce Modelizer, a framework that learns a _model from its input/output behavior using _neural machine translation_.
Modelizer uses _grammars_ to synthesize inputs and to parse the resulting outputs, allowing it to learn sequence-to-sequence associations between token streams.
Other than input and output grammars, Modelizer only requires the ability to execute the program.
arXiv Detail & Related papers (2024-07-11T15:25:02Z) - Inferring Pluggable Types with Machine Learning [0.3867363075280544]
This paper investigates how to use machine learning to infer type qualifiers automatically.
We propose a novel representation, NaP-AST, that encodes minimal dataflow hints for the effective inference of type qualifiers.
arXiv Detail & Related papers (2024-06-21T22:32:42Z) - Generative Type Inference for Python [62.01560866916557]
This paper introduces TypeGen, a few-shot generative type inference approach that incorporates static domain knowledge from static analysis.
TypeGen creates chain-of-thought (COT) prompts by translating the type inference steps of static analysis into prompts based on the type dependency graphs (TDGs)
Experiments show that TypeGen outperforms the best baseline Type4Py by 10.0% for argument type prediction and 22.5% in return value type prediction in terms of top-1 Exact Match.
arXiv Detail & Related papers (2023-07-18T11:40:31Z) - TypeT5: Seq2seq Type Inference using Static Analysis [51.153089609654174]
We present a new type inference method that treats type prediction as a code infilling task.
Our method uses static analysis to construct dynamic contexts for each code element whose type signature is to be predicted by the model.
We also propose an iterative decoding scheme that incorporates previous type predictions in the model's input context.
arXiv Detail & Related papers (2023-03-16T23:48:00Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z) - Type4Py: Deep Similarity Learning-Based Type Inference for Python [9.956021565144662]
We present Type4Py, a deep similarity learning-based type inference model for Python.
We design a hierarchical neural network model that learns to discriminate between types of the same kind and dissimilar types in a high-dimensional space.
Considering the Top-1 prediction, Type4Py obtains 19.33% and 13.49% higher precision than Typilus and TypeWriter, respectively.
arXiv Detail & Related papers (2021-01-12T13:32:53Z) - Unsupervised Label-aware Event Trigger and Argument Classification [73.86358632937372]
We propose an unsupervised event extraction pipeline, which first identifies events with available tools (e.g., SRL) and then automatically maps them to pre-defined event types.
We leverage pre-trained language models to contextually represent pre-defined types for both event triggers and arguments.
We successfully map 83% of the triggers and 54% of the arguments to the correct types, almost doubling the performance of previous zero-shot approaches.
arXiv Detail & Related papers (2020-12-30T17:47:24Z) - Advanced Graph-Based Deep Learning for Probabilistic Type Inference [0.8508198765617194]
We introduce a range of graph neural network (GNN) models that operate on a novel type flow graph (TFG) representation.
Our GNN models are trained to predict the type labels in the TFG for a given input program.
We show that our best two GNN configurations for accuracy achieve a top-1 accuracy of 87.76% and 86.89% respectively.
arXiv Detail & Related papers (2020-09-13T08:13:01Z) - LambdaNet: Probabilistic Type Inference using Graph Neural Networks [46.66093127573704]
This paper proposes a probabilistic type inference scheme for TypeScript based on a graph neural network.
Our approach can predict both standard types, like number or string, as well as user-defined types that have not been encountered during training.
arXiv Detail & Related papers (2020-04-29T17:48:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.