Advanced Graph-Based Deep Learning for Probabilistic Type Inference
- URL: http://arxiv.org/abs/2009.05949v2
- Date: Sun, 14 Nov 2021 06:08:07 GMT
- Title: Advanced Graph-Based Deep Learning for Probabilistic Type Inference
- Authors: Fangke Ye, Jisheng Zhao, Vivek Sarkar
- Abstract summary: We introduce a range of graph neural network (GNN) models that operate on a novel type flow graph (TFG) representation.
Our GNN models are trained to predict the type labels in the TFG for a given input program.
We show that our best two GNN configurations for accuracy achieve a top-1 accuracy of 87.76% and 86.89% respectively.
- Score: 0.8508198765617194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dynamically typed languages such as JavaScript and Python have emerged as the
most popular programming languages in use. Important benefits can accrue from
including type annotations in dynamically typed programs. This approach to
gradual typing is exemplified by the TypeScript programming system which allows
programmers to specify partially typed programs, and then uses static analysis
to infer the remaining types. However, in general, the effectiveness of static
type inference is limited and depends on the complexity of the program's
structure and the initial type annotations. As a result, there is a strong
motivation for new approaches that can advance the state of the art in
statically predicting types in dynamically typed programs, and that do so with
acceptable performance for use in interactive programming environments.
Previous work has demonstrated the promise of probabilistic type inference
using deep learning. In this paper, we advance past work by introducing a range
of graph neural network (GNN) models that operate on a novel type flow graph
(TFG) representation. The TFG represents an input program's elements as graph
nodes connected with syntax edges and data flow edges, and our GNN models are
trained to predict the type labels in the TFG for a given input program. We
study different design choices for our GNN models for the 100 most common types
in our evaluation dataset, and show that our best two GNN configurations for
accuracy achieve a top-1 accuracy of 87.76% and 86.89% respectively,
outperforming the two most closely related deep learning type inference
approaches from past work -- DeepTyper with a top-1 accuracy of 84.62% and
LambdaNet with a top-1 accuracy of 79.45%. Further, the average inference
throughputs of those two configurations are 353.8 and 1,303.9 files/second,
compared to 186.7 files/second for DeepTyper and 1,050.3 files/second for
LambdaNet.
Related papers
- Inferring Pluggable Types with Machine Learning [0.3867363075280544]
This paper investigates how to use machine learning to infer type qualifiers automatically.
We propose a novel representation, NaP-AST, that encodes minimal dataflow hints for the effective inference of type qualifiers.
arXiv Detail & Related papers (2024-06-21T22:32:42Z) - Learning Type Inference for Enhanced Dataflow Analysis [6.999203506253375]
We propose CodeTIDAL5, a Transformer-based model trained to reliably predict type annotations.
Our model outperforms the current state-of-the-art by 7.85% on the ManyTypes4TypeScript benchmark.
We present JoernTI, an integration of our approach into Joern, an open source static analysis tool.
arXiv Detail & Related papers (2023-10-01T13:52:28Z) - Generative Type Inference for Python [62.01560866916557]
This paper introduces TypeGen, a few-shot generative type inference approach that incorporates static domain knowledge from static analysis.
TypeGen creates chain-of-thought (COT) prompts by translating the type inference steps of static analysis into prompts based on the type dependency graphs (TDGs)
Experiments show that TypeGen outperforms the best baseline Type4Py by 10.0% for argument type prediction and 22.5% in return value type prediction in terms of top-1 Exact Match.
arXiv Detail & Related papers (2023-07-18T11:40:31Z) - Type Prediction With Program Decomposition and Fill-in-the-Type Training [2.7998963147546143]
We build OpenTau, a search-based approach for type prediction that leverages large language models.
We evaluate our work with a new dataset for TypeScript type prediction, and show that 47.4% of files type check (14.5% absolute improvement) with an overall rate of 3.3 type errors per file.
arXiv Detail & Related papers (2023-05-25T21:16:09Z) - TypeT5: Seq2seq Type Inference using Static Analysis [51.153089609654174]
We present a new type inference method that treats type prediction as a code infilling task.
Our method uses static analysis to construct dynamic contexts for each code element whose type signature is to be predicted by the model.
We also propose an iterative decoding scheme that incorporates previous type predictions in the model's input context.
arXiv Detail & Related papers (2023-03-16T23:48:00Z) - Neural Graph Matching for Pre-training Graph Neural Networks [72.32801428070749]
Graph neural networks (GNNs) have been shown powerful capacity at modeling structural data.
We present a novel Graph Matching based GNN Pre-Training framework, called GMPT.
The proposed method can be applied to fully self-supervised pre-training and coarse-grained supervised pre-training.
arXiv Detail & Related papers (2022-03-03T09:53:53Z) - Distance Encoding: Design Provably More Powerful Neural Networks for
Graph Representation Learning [63.97983530843762]
Graph Neural Networks (GNNs) have achieved great success in graph representation learning.
GNNs generate identical representations for graph substructures that may in fact be very different.
More powerful GNNs, proposed recently by mimicking higher-order tests, are inefficient as they cannot sparsity of underlying graph structure.
We propose Distance Depiction (DE) as a new class of graph representation learning.
arXiv Detail & Related papers (2020-08-31T23:15:40Z) - Scaling Graph Neural Networks with Approximate PageRank [64.92311737049054]
We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs.
In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings.
We show that training PPRGo and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph.
arXiv Detail & Related papers (2020-07-03T09:30:07Z) - LambdaNet: Probabilistic Type Inference using Graph Neural Networks [46.66093127573704]
This paper proposes a probabilistic type inference scheme for TypeScript based on a graph neural network.
Our approach can predict both standard types, like number or string, as well as user-defined types that have not been encountered during training.
arXiv Detail & Related papers (2020-04-29T17:48:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.