Call Me Maybe: Enhancing JavaScript Call Graph Construction using Graph Neural Networks
- URL: http://arxiv.org/abs/2506.18191v1
- Date: Sun, 22 Jun 2025 22:26:44 GMT
- Title: Call Me Maybe: Enhancing JavaScript Call Graph Construction using Graph Neural Networks
- Authors: Masudul Hasan Masud Bhuiyan, Gianluca De Stefano, Giancarlo Pellegrino, Cristian-Alexandru Staicu,
- Abstract summary: Previous work shows that even advanced solutions produce false edges and miss valid ones.<n>Our main idea is to frame the problem as link prediction on full program graphs, using a rich representation with multiple edge types.<n>Our results show that learning-based methods can improve the recall of JavaScript call graph construction.
- Score: 15.40199816880172
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Static analysis plays a key role in finding bugs, including security issues. A critical step in static analysis is building accurate call graphs that model function calls in a program. However, due to hard-to-analyze language features, existing call graph construction algorithms for JavaScript are neither sound nor complete. Prior work shows that even advanced solutions produce false edges and miss valid ones. In this work, we assist these tools by identifying missed call edges. Our main idea is to frame the problem as link prediction on full program graphs, using a rich representation with multiple edge types. Our approach, GRAPHIA, leverages recent advances in graph neural networks to model non-local relationships between code elements. Concretely, we propose representing JavaScript programs using a combination of syntactic- and semantic-based edges. GRAPHIA can learn from imperfect labels, including static call edges from existing tools and dynamic edges from tests, either from the same or different projects. Because call graphs are sparse, standard machine learning metrics like ROC are not suitable. Instead, we evaluate GRAPHIA by ranking function definitions for each unresolved call site. We conduct a large-scale evaluation on 50 popular JavaScript libraries with 163K call edges (150K static and 13K dynamic). GRAPHIA builds program graphs with 6.6M structural and 386K semantic edges. It ranks the correct target as the top candidate in over 42% of unresolved cases and within the top 5 in 72% of cases, reducing the manual effort needed for analysis. Our results show that learning-based methods can improve the recall of JavaScript call graph construction. To our knowledge, this is the first work to apply GNN-based link prediction to full multi-file program graphs for interprocedural analysis.
Related papers
- An Automatic Graph Construction Framework based on Large Language Models for Recommendation [49.51799417575638]
We introduce AutoGraph, an automatic graph construction framework based on large language models for recommendation.<n>LLMs infer the user preference and item knowledge, which is encoded as semantic vectors.<n>Latent factors are incorporated as extra nodes to link the user/item nodes, resulting in a graph with in-depth global-view semantics.
arXiv Detail & Related papers (2024-12-24T07:51:29Z) - Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models [88.4320775961431]
We introduce ProGraph, a benchmark for large language models (LLMs) to process graphs.<n>Our findings reveal that the performance of current LLMs is unsatisfactory, with the best model achieving only 36% accuracy.<n>We propose LLM4Graph datasets, which include crawled documents and auto-generated codes based on 6 widely used graph libraries.
arXiv Detail & Related papers (2024-09-29T11:38:45Z) - Static JavaScript Call Graphs: A Comparative Study [2.0512104126857786]
We systematically compare five widely adopted static algorithms for building JavaScript call graphs.
We found that there was a relatively large intersection of the found call edges among the algorithms, which proved to be 100 precise.
ACG had the highest precision followed immediately by TAJS, but ACG found significantly more call edges.
arXiv Detail & Related papers (2024-05-12T08:10:46Z) - Learnable Graph Matching: A Practical Paradigm for Data Association [74.28753343714858]
We propose a general learnable graph matching method to address these issues.
Our method achieves state-of-the-art performance on several MOT datasets.
For image matching, our method outperforms state-of-the-art methods on a popular indoor dataset, ScanNet.
arXiv Detail & Related papers (2023-03-27T17:39:00Z) - Are All Edges Necessary? A Unified Framework for Graph Purification [6.795209119198288]
Not all edges in a graph are necessary for the training of machine learning models.
In this paper, we try to provide a method to drop edges in order to purify the graph data from a new perspective.
arXiv Detail & Related papers (2022-11-09T20:28:25Z) - AutoPruner: Transformer-Based Call Graph Pruning [7.319973664340497]
We present a novel call graph pruning technique, AutoPruner, for eliminating false positives in call graphs via both statistical semantic and structural analysis.
Our empirical evaluation on a benchmark dataset of real-world programs shows that AutoPruner outperforms the state-of-the-art baselines.
arXiv Detail & Related papers (2022-09-07T15:35:28Z) - Node Feature Extraction by Self-Supervised Multi-scale Neighborhood
Prediction [123.20238648121445]
We propose a new self-supervised learning framework, Graph Information Aided Node feature exTraction (GIANT)
GIANT makes use of the eXtreme Multi-label Classification (XMC) formalism, which is crucial for fine-tuning the language model based on graph information.
We demonstrate the superior performance of GIANT over the standard GNN pipeline on Open Graph Benchmark datasets.
arXiv Detail & Related papers (2021-10-29T19:55:12Z) - Learnable Graph Matching: Incorporating Graph Partitioning with Deep
Feature Learning for Multiple Object Tracking [58.30147362745852]
Data association across frames is at the core of Multiple Object Tracking (MOT) task.
Existing methods mostly ignore the context information among tracklets and intra-frame detections.
We propose a novel learnable graph matching method to address these issues.
arXiv Detail & Related papers (2021-03-30T08:58:45Z) - Grale: Designing Networks for Graph Learning [68.23038997141381]
We present Grale, a scalable method we have developed to address the problem of graph design for graphs with billions of nodes.
Grale operates by fusing together different measures of(potentially weak) similarity to create a graph which exhibits high task-specific homophily between its nodes.
We have deployed Grale in more than 20 different industrial settings at Google, including datasets which have tens of billions of nodes, and hundreds of trillions of potential edges to score.
arXiv Detail & Related papers (2020-07-23T13:25:36Z) - Learning Graph Structure With A Finite-State Automaton Layer [31.028101360041227]
We study the problem of learning to derive abstract relations from the intrinsic graph structure.
We show how to learn these relations end-to-end by relaxing the problem into learning finite-state automata policies.
We demonstrate that this layer can find shortcuts in grid-world graphs and reproduce simple static analyses on Python programs.
arXiv Detail & Related papers (2020-07-09T17:01:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.