Contextual Tokenization for Graph Inverted Indices
- URL: http://arxiv.org/abs/2510.22479v2
- Date: Mon, 03 Nov 2025 10:11:55 GMT
- Title: Contextual Tokenization for Graph Inverted Indices
- Authors: Pritish Chakraborty, Indradyumna Roy, Soumen Chakrabarti, Abir De,
- Abstract summary: CORGII is an indexer of dense graph representations using discrete tokens mapping to efficient inverted lists.<n>We replace the classical, fixed impact weight of a token' on a graph with a data-driven, trainable impact weight.<n>To our knowledge, CORGII is the first indexer of dense graph representations using discrete tokens mapping to efficient inverted lists.
- Score: 38.641973640693585
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieving graphs from a large corpus, that contain a subgraph isomorphic to a given query graph, is a core operation in many real-world applications. While recent multi-vector graph representations and scores based on set alignment and containment can provide accurate subgraph isomorphism tests, their use in retrieval remains limited by their need to score corpus graphs exhaustively. We introduce CORGII (Contextual Representation of Graphs for Inverted Indexing), a graph indexing framework in which, starting with a contextual dense graph representation, a differentiable discretization module computes sparse binary codes over a learned latent vocabulary. This text document-like representation allows us to leverage classic, highly optimized inverted indices, while supporting soft (vector) set containment scores. Pushing this paradigm further, we replace the classical, fixed impact weight of a `token' on a graph (such as TFIDF or BM25) with a data-driven, trainable impact weight. Finally, we explore token expansion to support multi-probing the index for smoother accuracy-efficiency tradeoffs. To our knowledge, CORGII is the first indexer of dense graph representations using discrete tokens mapping to efficient inverted lists. Extensive experiments show that CORGII provides better trade-offs between accuracy and efficiency, compared to several baselines.
Related papers
- SWING: Unlocking Implicit Graph Representations for Graph Random Features [57.956136773668476]
We propose SWING: Space Walks for Implicit Network Graphs, a new class of algorithms for computations involving Graph Random Features on graphs.<n>We provide detailed analysis of SWING and complement it with thorough experiments on different classes of i-graphs.
arXiv Detail & Related papers (2026-02-13T08:12:38Z) - Heterogeneous Graph Alignment for Joint Reasoning and Interpretability [2.2710270108565207]
We present the Multi-Graph Meta-Transformer (MGMT), a unified, scalable, and interpretable framework for cross-graph learning.<n>MGMT first applies Graph Transformer encoders to each graph, mapping structure and attributes into a shared latent space.<n>It then selects task-relevant supernodes via attention and builds a meta-graph that connects functionally aligned supernodes across graphs using similarity in the latent space.
arXiv Detail & Related papers (2026-01-30T05:40:13Z) - A Flexible, Equivariant Framework for Subgraph GNNs via Graph Products and Graph Coarsening [18.688057947275112]
Subgraph GNNs enhance message-passing GNNs expressivity by representing graphs as sets of subgraphs.<n>Previous approaches attempted to generate smaller subsets of subgraphs through random or learnable sampling.<n>This paper introduces a new Subgraph GNN framework to address these issues.
arXiv Detail & Related papers (2024-06-13T16:29:06Z) - Discrete Graph Auto-Encoder [52.50288418639075]
We introduce a new framework named Discrete Graph Auto-Encoder (DGAE)
We first use a permutation-equivariant auto-encoder to convert graphs into sets of discrete latent node representations.
In the second step, we sort the sets of discrete latent representations and learn their distribution with a specifically designed auto-regressive model.
arXiv Detail & Related papers (2023-06-13T12:40:39Z) - Seq-HGNN: Learning Sequential Node Representation on Heterogeneous Graph [57.2953563124339]
We propose a novel heterogeneous graph neural network with sequential node representation, namely Seq-HGNN.
We conduct extensive experiments on four widely used datasets from Heterogeneous Graph Benchmark (HGB) and Open Graph Benchmark (OGB)
arXiv Detail & Related papers (2023-05-18T07:27:18Z) - NESS: Node Embeddings from Static SubGraphs [0.0]
We present a framework for learning Node Embeddings from Static Subgraphs (NESS) using a graph autoencoder (GAE) in a transductive setting.
NESS is based on two key ideas: i) Partitioning the training graph to multiple static, sparse subgraphs with non-overlapping edges using random edge split during data pre-processing.
We demonstrate that NESS gives a better node representation for link prediction tasks compared to current autoencoding methods that use either the whole graph or subgraphs.
arXiv Detail & Related papers (2023-03-15T22:14:28Z) - Learning to Count Isomorphisms with Graph Neural Networks [16.455234748896157]
Subgraph isomorphism counting is an important problem on graphs.
In this paper, we propose a novel graph neural network (GNN) called Count-GNN for subgraph isomorphism counting.
arXiv Detail & Related papers (2023-02-07T05:32:11Z) - FactGraph: Evaluating Factuality in Summarization with Semantic Graph
Representations [114.94628499698096]
We propose FactGraph, a method that decomposes the document and the summary into structured meaning representations (MRs)
MRs describe core semantic concepts and their relations, aggregating the main content in both document and summary in a canonical form, and reducing data sparsity.
Experiments on different benchmarks for evaluating factuality show that FactGraph outperforms previous approaches by up to 15%.
arXiv Detail & Related papers (2022-04-13T16:45:33Z) - Interactive Visual Pattern Search on Graph Data via Graph Representation
Learning [20.795511688640296]
We propose a visual analytics system GraphQ to support human-in-the-loop, example-based, subgraph pattern search.
To support fast, interactive queries, we use graph neural networks (GNNs) to encode a graph as fixed-length latent vector representation.
We also propose a novel GNN for node-alignment called NeuroAlign to facilitate easy validation and interpretation of the query results.
arXiv Detail & Related papers (2022-02-18T22:30:28Z) - Joint Graph Learning and Matching for Semantic Feature Correspondence [69.71998282148762]
We propose a joint emphgraph learning and matching network, named GLAM, to explore reliable graph structures for boosting graph matching.
The proposed method is evaluated on three popular visual matching benchmarks (Pascal VOC, Willow Object and SPair-71k)
It outperforms previous state-of-the-art graph matching methods by significant margins on all benchmarks.
arXiv Detail & Related papers (2021-09-01T08:24:02Z) - Accurate Learning of Graph Representations with Graph Multiset Pooling [45.72542969364438]
We propose a Graph Multiset Transformer (GMT) that captures the interaction between nodes according to their structural dependencies.
Our experimental results show that GMT significantly outperforms state-of-the-art graph pooling methods on graph classification benchmarks.
arXiv Detail & Related papers (2021-02-23T07:45:58Z) - Graph Pooling with Node Proximity for Hierarchical Representation
Learning [80.62181998314547]
We propose a novel graph pooling strategy that leverages node proximity to improve the hierarchical representation learning of graph data with their multi-hop topology.
Results show that the proposed graph pooling strategy is able to achieve state-of-the-art performance on a collection of public graph classification benchmark datasets.
arXiv Detail & Related papers (2020-06-19T13:09:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.