Related papers: Do Machine Learning Models Produce TypeScript Types That Type Check?

Do Machine Learning Models Produce TypeScript Types That Type Check?

URL: http://arxiv.org/abs/2302.12163v2
Date: Tue, 11 Jul 2023 17:24:57 GMT
Title: Do Machine Learning Models Produce TypeScript Types That Type Check?
Authors: Ming-Ho Yee, Arjun Guha
Abstract summary: We present TypeWeaver, a TypeScript type migration tool that can be used with an arbitrary type prediction model. We evaluate it with three models from the literature: DeepTyper, a recurrent neural network; LambdaNet, a graph neural network; and InCoder, a general-purpose, multi-language transformer. With the best type prediction model, we find that only 21% of packages type check, but more encouragingly, 69% of files type check successfully.
Score: 2.1365083849371747
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Type migration is the process of adding types to untyped code to gain assurance at compile time. TypeScript and other gradual type systems facilitate type migration by allowing programmers to start with imprecise types and gradually strengthen them. However, adding types is a manual effort and several migrations on large, industry codebases have been reported to have taken several years. In the research community, there has been significant interest in using machine learning to automate TypeScript type migration. Existing machine learning models report a high degree of accuracy in predicting individual TypeScript type annotations. However, in this paper we argue that accuracy can be misleading, and we should address a different question: can an automatic type migration tool produce code that passes the TypeScript type checker? We present TypeWeaver, a TypeScript type migration tool that can be used with an arbitrary type prediction model. We evaluate TypeWeaver with three models from the literature: DeepTyper, a recurrent neural network; LambdaNet, a graph neural network; and InCoder, a general-purpose, multi-language transformer that supports fill-in-the-middle tasks. Our tool automates several steps that are necessary for using a type prediction model, (1) importing types for a project's dependencies; (2) migrating JavaScript modules to TypeScript notation; (3) inserting predicted type annotations into the program to produce TypeScript when needed; and (4) rejecting non-type predictions when needed. We evaluate TypeWeaver on a dataset of 513 JavaScript packages, including packages that have never been typed before. With the best type prediction model, we find that only 21% of packages type check, but more encouragingly, 69% of files type check successfully.

Related papers

Type-Constrained Code Generation with Language Models [51.03439021895432]
Large language models (LLMs) produce uncompilable output because their next-token inference procedure does not model formal aspects of code. We introduce a type-constrained decoding approach that leverages type systems to guide code generation. Our approach reduces compilation errors by more than half and increases functional correctness in code synthesis, translation, and repair tasks.
arXiv Detail & Related papers (2025-04-12T15:03:00Z)
Inferring Pluggable Types with Machine Learning [0.3867363075280544]
This paper investigates how to use machine learning to infer type qualifiers automatically. We propose a novel representation, NaP-AST, that encodes minimal dataflow hints for the effective inference of type qualifiers.
arXiv Detail & Related papers (2024-06-21T22:32:42Z)
Generative Type Inference for Python [62.01560866916557]
This paper introduces TypeGen, a few-shot generative type inference approach that incorporates static domain knowledge from static analysis. TypeGen creates chain-of-thought (COT) prompts by translating the type inference steps of static analysis into prompts based on the type dependency graphs (TDGs) Experiments show that TypeGen outperforms the best baseline Type4Py by 10.0% for argument type prediction and 22.5% in return value type prediction in terms of top-1 Exact Match.
arXiv Detail & Related papers (2023-07-18T11:40:31Z)
Type Prediction With Program Decomposition and Fill-in-the-Type Training [2.7998963147546143]
We build OpenTau, a search-based approach for type prediction that leverages large language models. We evaluate our work with a new dataset for TypeScript type prediction, and show that 47.4% of files type check (14.5% absolute improvement) with an overall rate of 3.3 type errors per file.
arXiv Detail & Related papers (2023-05-25T21:16:09Z)
TypeT5: Seq2seq Type Inference using Static Analysis [51.153089609654174]
We present a new type inference method that treats type prediction as a code infilling task. Our method uses static analysis to construct dynamic contexts for each code element whose type signature is to be predicted by the model. We also propose an iterative decoding scheme that incorporates previous type predictions in the model's input context.
arXiv Detail & Related papers (2023-03-16T23:48:00Z)
Query and Extract: Refining Event Extraction as Type-oriented Binary Decoding [51.57864297948228]
We propose a novel event extraction framework that takes event types and argument roles as natural language queries. Our framework benefits from the attention mechanisms to better capture the semantic correlation between the event types or argument roles and the input text.
arXiv Detail & Related papers (2021-10-14T15:49:40Z)
Type4Py: Deep Similarity Learning-Based Type Inference for Python [9.956021565144662]
We present Type4Py, a deep similarity learning-based type inference model for Python. We design a hierarchical neural network model that learns to discriminate between types of the same kind and dissimilar types in a high-dimensional space. Considering the Top-1 prediction, Type4Py obtains 19.33% and 13.49% higher precision than Typilus and TypeWriter, respectively.
arXiv Detail & Related papers (2021-01-12T13:32:53Z)
Unsupervised Label-aware Event Trigger and Argument Classification [73.86358632937372]
We propose an unsupervised event extraction pipeline, which first identifies events with available tools (e.g., SRL) and then automatically maps them to pre-defined event types. We leverage pre-trained language models to contextually represent pre-defined types for both event triggers and arguments. We successfully map 83% of the triggers and 54% of the arguments to the correct types, almost doubling the performance of previous zero-shot approaches.
arXiv Detail & Related papers (2020-12-30T17:47:24Z)
Learning Sparse Prototypes for Text Generation [120.38555855991562]
Prototype-driven text generation is inefficient at test time as a result of needing to store and index the entire training corpus. We propose a novel generative model that automatically learns a sparse prototype support set that achieves strong language modeling performance. In experiments, our model outperforms previous prototype-driven language models while achieving up to a 1000x memory reduction.
arXiv Detail & Related papers (2020-06-29T19:41:26Z)
LambdaNet: Probabilistic Type Inference using Graph Neural Networks [46.66093127573704]
This paper proposes a probabilistic type inference scheme for TypeScript based on a graph neural network. Our approach can predict both standard types, like number or string, as well as user-defined types that have not been encountered during training.
arXiv Detail & Related papers (2020-04-29T17:48:40Z)
Typilus: Neural Type Hints [17.332608142043004]
We present a graph neural network model that predicts types by probabilistically reasoning over a program's structure, names, and patterns. Our model can employ one-shot learning to predict an open vocabulary of types, including rare and user-defined ones. We show that Typilus confidently predicts types for 70% of all annotatable symbols.
arXiv Detail & Related papers (2020-04-06T11:14:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.