TIGER: A Generating-Then-Ranking Framework for Practical Python Type Inference
- URL: http://arxiv.org/abs/2407.02095v3
- Date: Tue, 13 Aug 2024 14:21:47 GMT
- Title: TIGER: A Generating-Then-Ranking Framework for Practical Python Type Inference
- Authors: Chong Wang, Jian Zhang, Yiling Lou, Mingwei Liu, Weisong Sun, Yang Liu, Xin Peng,
- Abstract summary: Python's dynamic typing system offers flexibility and expressiveness but can lead to type-related errors.
We introduce TIGER, a two-stage generating-then-ranking (GTR) framework to handle Python's diverse type categories.
Our evaluation on the ManyTypes4Py dataset shows TIGER's advantage over existing methods in various type categories.
- Score: 16.192704604206206
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Python's dynamic typing system offers flexibility and expressiveness but can lead to type-related errors, prompting the need for automated type inference to enhance type hinting. While existing learning-based approaches show promising inference accuracy, they struggle with practical challenges in comprehensively handling various types, including complex generic types and (unseen) user-defined types. In this paper, we introduce TIGER, a two-stage generating-then-ranking (GTR) framework, designed to effectively handle Python's diverse type categories. TIGER leverages fine-tuned pre-trained code models to train a generative model with a span masking objective and a similarity model with a contrastive training objective. This approach allows TIGER to generate a wide range of type candidates, including complex generics in the generating stage, and accurately rank them with user-defined types in the ranking stage. Our evaluation on the ManyTypes4Py dataset shows TIGER's advantage over existing methods in various type categories, notably improving accuracy in inferring user-defined and unseen types by 11.2% and 20.1% respectively in Top-5 Exact Match. Moreover, the experimental results not only demonstrate TIGER's superior performance and efficiency, but also underscore the significance of its generating and ranking stages in enhancing automated type inference.
Related papers
- Co-Evolution of Types and Dependencies: Towards Repository-Level Type Inference for Python Code [14.7995140938911]
Python's dynamic typing mechanism, while promoting flexibility, is a significant source of runtime type errors.<n>We present methodName, a novel approach that achieves repository-level type inference through the co-evolution of types and dependencies.<n>Our key innovations are: (1) an EDG model designed to capture repository-level type dependencies; (2) an iterative type inference approach where types and dependencies co-evolve in each iteration; and (3) a type-checker-in-the-loop strategy that validates and corrects inferences on-the-fly.
arXiv Detail & Related papers (2025-12-25T09:15:06Z) - GLiClass: Generalist Lightweight Model for Sequence Classification Tasks [49.2639069781367]
We propose GLiClass, a novel method that adapts the GLiNER architecture for sequence classification tasks.<n>Our approach achieves strong accuracy and efficiency comparable to embedding-based methods, while maintaining the flexibility needed for zero-shot and few-shot learning scenarios.
arXiv Detail & Related papers (2025-08-11T06:22:25Z) - Generative Multi-modal Models are Good Class-Incremental Learners [51.5648732517187]
We propose a novel generative multi-modal model (GMM) framework for class-incremental learning.
Our approach directly generates labels for images using an adapted generative model.
Under the Few-shot CIL setting, we have improved by at least 14% accuracy over all the current state-of-the-art methods with significantly less forgetting.
arXiv Detail & Related papers (2024-03-27T09:21:07Z) - AdaTyper: Adaptive Semantic Column Type Detection [4.062265896931587]
We propose AdaTyper to address one of the most critical deployment challenges: adaptation.
AdaTyper uses weak-supervision to adapt a hybrid type predictor towards new semantic types and shifted data distributions at inference time.
We evaluate the adaptation performance of AdaTyper on real-world database tables hand-annotated with semantic column types through crowdsourcing.
arXiv Detail & Related papers (2023-11-23T04:42:27Z) - Calibrated Seq2seq Models for Efficient and Generalizable Ultra-fine
Entity Typing [10.08153231108538]
We present CASENT, a seq2seq model designed for ultra-fine entity typing.
Our model takes an entity mention as input and employs constrained beam search to generate multiple types autoregressively.
Our method outperforms the previous state-of-the-art in terms of F1 score and calibration error, while achieving an inference speedup of over 50 times.
arXiv Detail & Related papers (2023-11-01T20:39:12Z) - Generative Type Inference for Python [62.01560866916557]
This paper introduces TypeGen, a few-shot generative type inference approach that incorporates static domain knowledge from static analysis.
TypeGen creates chain-of-thought (COT) prompts by translating the type inference steps of static analysis into prompts based on the type dependency graphs (TDGs)
Experiments show that TypeGen outperforms the best baseline Type4Py by 10.0% for argument type prediction and 22.5% in return value type prediction in terms of top-1 Exact Match.
arXiv Detail & Related papers (2023-07-18T11:40:31Z) - TypeT5: Seq2seq Type Inference using Static Analysis [51.153089609654174]
We present a new type inference method that treats type prediction as a code infilling task.
Our method uses static analysis to construct dynamic contexts for each code element whose type signature is to be predicted by the model.
We also propose an iterative decoding scheme that incorporates previous type predictions in the model's input context.
arXiv Detail & Related papers (2023-03-16T23:48:00Z) - Automatically Discovering Novel Visual Categories with Self-supervised
Prototype Learning [68.63910949916209]
This paper tackles the problem of novel category discovery (NCD), which aims to discriminate unknown categories in large-scale image collections.
We propose a novel adaptive prototype learning method consisting of two main stages: prototypical representation learning and prototypical self-training.
We conduct extensive experiments on four benchmark datasets and demonstrate the effectiveness and robustness of the proposed method with state-of-the-art performance.
arXiv Detail & Related papers (2022-08-01T16:34:33Z) - Ultra-fine Entity Typing with Indirect Supervision from Natural Language
Inference [28.78215056129358]
This work presents LITE, a new approach that formulates entity typing as a natural language inference (NLI) problem.
Experiments show that, with limited training data, LITE obtains state-of-the-art performance on the UFET task.
arXiv Detail & Related papers (2022-02-12T23:56:26Z) - Dual Prototypical Contrastive Learning for Few-shot Semantic
Segmentation [55.339405417090084]
We propose a dual prototypical contrastive learning approach tailored to the few-shot semantic segmentation (FSS) task.
The main idea is to encourage the prototypes more discriminative by increasing inter-class distance while reducing intra-class distance in prototype feature space.
We demonstrate that the proposed dual contrastive learning approach outperforms state-of-the-art FSS methods on PASCAL-5i and COCO-20i datasets.
arXiv Detail & Related papers (2021-11-09T08:14:50Z) - Type4Py: Deep Similarity Learning-Based Type Inference for Python [9.956021565144662]
We present Type4Py, a deep similarity learning-based type inference model for Python.
We design a hierarchical neural network model that learns to discriminate between types of the same kind and dissimilar types in a high-dimensional space.
Considering the Top-1 prediction, Type4Py obtains 19.33% and 13.49% higher precision than Typilus and TypeWriter, respectively.
arXiv Detail & Related papers (2021-01-12T13:32:53Z) - Interpretable Entity Representations through Large-Scale Typing [61.4277527871572]
We present an approach to creating entity representations that are human readable and achieve high performance out of the box.
Our representations are vectors whose values correspond to posterior probabilities over fine-grained entity types.
We show that it is possible to reduce the size of our type set in a learning-based way for particular domains.
arXiv Detail & Related papers (2020-04-30T23:58:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.