RightTyper: Effective and Efficient Type Annotation for Python
- URL: http://arxiv.org/abs/2507.16051v1
- Date: Mon, 21 Jul 2025 20:37:35 GMT
- Title: RightTyper: Effective and Efficient Type Annotation for Python
- Authors: Juan Altmayer Pizzorno, Emery D. Berger,
- Abstract summary: RightTyper generates precise type annotations based on actual program behavior.<n>RightTyper is fast and space-efficient, imposing just 30% performance overhead on average.
- Score: 0.7673339435080445
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Python type annotations bring the benefits of static type checking to the language. However, manually writing annotations can be time-consuming and tedious. The result is that most real-world Python code remains largely untyped. Past approaches to annotating types in Python code fall short in a number of ways. Static approaches struggle with dynamic features and infer overly broad types. AI-based methods are inherently unsound and can miss rare or user-defined types. Dynamic methods can impose extreme runtime overheads, degrading performance by up to 270x, abort execution as they exhaust resources, and even infer incorrect types that lead to runtime errors. Crucially, all prior work assumes implicitly that the code to be annotated is already correct. This assumption is generally unwarranted, especially for large codebases that have been untyped. This paper presents RightTyper, a novel approach for Python that overcomes these disadvantages. RightTyper not only generates precise type annotations based on actual program behavior, improving recall in type checking relative to prior approaches. It also turns type checking into anomaly detection, allowing the type checker to identify corner cases that the programmer can audit for unintended behavior. RightTyper is also fast and space-efficient, imposing just 30% performance overhead on average. RightTyper achieves these characteristics by a principled yet pervasive use of sampling--guided by self-profiling--along with statistical filtering and careful resolution and aggregation of type information.
Related papers
- Type-Constrained Code Generation with Language Models [51.03439021895432]
We introduce a type-constrained decoding approach that leverages type systems to guide code generation.<n>For this purpose, we develop novel prefix automata and a search over inhabitable types, forming a sound approach to enforce well-typedness on LLM-generated code.<n>Our approach reduces compilation errors by more than half and significantly increases functional correctness in code synthesis, translation, and repair tasks.
arXiv Detail & Related papers (2025-04-12T15:03:00Z) - Toward a Corpus Study of the Dynamic Gradual Type [0.0]
This paper reports on an in-progress corpus study of the dynamic type in Python, targeting 221 GitHub projects that use the mypy type checker.<n>The study reveals eight patterns-of-use for the dynamic type, which have implications for future refinements of the mypy type system and for tool support to encourage precise type annotations.
arXiv Detail & Related papers (2025-03-11T22:18:51Z) - Understanding prompt engineering may not require rethinking
generalization [56.38207873589642]
We show that the discrete nature of prompts, combined with a PAC-Bayes prior given by a language model, results in generalization bounds that are remarkably tight by the standards of the literature.
This work provides a possible justification for the widespread practice of prompt engineering.
arXiv Detail & Related papers (2023-10-06T00:52:48Z) - Learning Type Inference for Enhanced Dataflow Analysis [6.999203506253375]
We propose CodeTIDAL5, a Transformer-based model trained to reliably predict type annotations.
Our model outperforms the current state-of-the-art by 7.85% on the ManyTypes4TypeScript benchmark.
We present JoernTI, an integration of our approach into Joern, an open source static analysis tool.
arXiv Detail & Related papers (2023-10-01T13:52:28Z) - Mitigating Word Bias in Zero-shot Prompt-based Classifiers [55.60306377044225]
We show that matching class priors correlates strongly with the oracle upper bound performance.
We also demonstrate large consistent performance gains for prompt settings over a range of NLP tasks.
arXiv Detail & Related papers (2023-09-10T10:57:41Z) - Generative Type Inference for Python [62.01560866916557]
This paper introduces TypeGen, a few-shot generative type inference approach that incorporates static domain knowledge from static analysis.
TypeGen creates chain-of-thought (COT) prompts by translating the type inference steps of static analysis into prompts based on the type dependency graphs (TDGs)
Experiments show that TypeGen outperforms the best baseline Type4Py by 10.0% for argument type prediction and 22.5% in return value type prediction in terms of top-1 Exact Match.
arXiv Detail & Related papers (2023-07-18T11:40:31Z) - TypeT5: Seq2seq Type Inference using Static Analysis [51.153089609654174]
We present a new type inference method that treats type prediction as a code infilling task.
Our method uses static analysis to construct dynamic contexts for each code element whose type signature is to be predicted by the model.
We also propose an iterative decoding scheme that incorporates previous type predictions in the model's input context.
arXiv Detail & Related papers (2023-03-16T23:48:00Z) - Type4Py: Deep Similarity Learning-Based Type Inference for Python [9.956021565144662]
We present Type4Py, a deep similarity learning-based type inference model for Python.
We design a hierarchical neural network model that learns to discriminate between types of the same kind and dissimilar types in a high-dimensional space.
Considering the Top-1 prediction, Type4Py obtains 19.33% and 13.49% higher precision than Typilus and TypeWriter, respectively.
arXiv Detail & Related papers (2021-01-12T13:32:53Z) - LambdaNet: Probabilistic Type Inference using Graph Neural Networks [46.66093127573704]
This paper proposes a probabilistic type inference scheme for TypeScript based on a graph neural network.
Our approach can predict both standard types, like number or string, as well as user-defined types that have not been encountered during training.
arXiv Detail & Related papers (2020-04-29T17:48:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.