Related papers: Stars: Tera-Scale Graph Building for Clustering and Graph Learning

Stars: Tera-Scale Graph Building for Clustering and Graph Learning

URL: http://arxiv.org/abs/2212.02635v1
Date: Mon, 5 Dec 2022 22:43:26 GMT
Title: Stars: Tera-Scale Graph Building for Clustering and Graph Learning
Authors: CJ Carey, Jonathan Halcrow, Rajesh Jayaram, Vahab Mirrokni, Warren Schudy, Peilin Zhong
Abstract summary: $textitStars$ is a highly scalable method for building extremely sparse graphs via two-hop spanners. We demonstrate that Stars builds a graph in nearly-linear time, where approximate nearest neighbors are contained within two-hop neighborhoods.
Score: 14.265732346610534
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A fundamental procedure in the analysis of massive datasets is the construction of similarity graphs. Such graphs play a key role for many downstream tasks, including clustering, classification, graph learning, and nearest neighbor search. For these tasks, it is critical to build graphs which are sparse yet still representative of the underlying data. The benefits of sparsity are twofold: firstly, constructing dense graphs is infeasible in practice for large datasets, and secondly, the runtime of downstream tasks is directly influenced by the sparsity of the similarity graph. In this work, we present $\textit{Stars}$: a highly scalable method for building extremely sparse graphs via two-hop spanners, which are graphs where similar points are connected by a path of length at most two. Stars can construct two-hop spanners with significantly fewer similarity comparisons, which are a major bottleneck for learning based models where comparisons are expensive to evaluate. Theoretically, we demonstrate that Stars builds a graph in nearly-linear time, where approximate nearest neighbors are contained within two-hop neighborhoods. In practice, we have deployed Stars for multiple data sets allowing for graph building at the $\textit{Tera-Scale}$, i.e., for graphs with tens of trillions of edges. We evaluate the performance of Stars for clustering and graph learning, and demonstrate 10~1000-fold improvements in pairwise similarity comparisons compared to different baselines, and 2~10-fold improvement in running time without quality loss.

Related papers

LayoutGKN: Graph Similarity Learning of Floor Plans [11.388505978767627]
Most successful methods to compare graphs rely on costly intermediate cross-graph node-level interactions.<n>We introduce textbfGKN, a more efficient approach that postpones the cross-graph node-level interactions.<n>We show that LayoutGKN computes similarity comparably or better than graph matching networks.
arXiv Detail & Related papers (2025-09-03T21:56:16Z)
Neural Network Graph Similarity Computation Based on Graph Fusion [0.4681661603096334]
This paper revolutionizes the approach by introducing a parallel graph interaction method called graph fusion. We assess the similarity between graph pairs at two distinct levels-graph-level and node-level-introducing two innovative, yet straightforward, similarity computation algorithms. Our model outperforms leading baseline models in graph-to-graph classification and regression tasks.
arXiv Detail & Related papers (2025-02-25T15:28:41Z)
Robust Graph Structure Learning under Heterophily [12.557639223778722]
We propose a novel robust graph structure learning method to achieve a high-quality graph from heterophilic data for downstream tasks. We first apply a high-pass filter to make each node more distinctive from its neighbors by encoding structure information into the node features. Then, we learn a robust graph with an adaptive norm characterizing different levels of noise.
arXiv Detail & Related papers (2024-03-06T12:29:13Z)
Exploiting Edge Features in Graphs with Fused Network Gromov-Wasserstein Distance [18.522233517515975]
We introduce an extension of Gromov-Wasserstein distance for comparing graphs whose both nodes and edges have features. We empirically show the effectiveness of the novel distance in learning tasks where graphs occur in either input space or output space.
arXiv Detail & Related papers (2023-09-28T17:05:03Z)
Bures-Wasserstein Means of Graphs [60.42414991820453]
We propose a novel framework for defining a graph mean via embeddings in the space of smooth graph signal distributions. By finding a mean in this embedding space, we can recover a mean graph that preserves structural information. We establish the existence and uniqueness of the novel graph mean, and provide an iterative algorithm for computing it.
arXiv Detail & Related papers (2023-05-31T11:04:53Z)
CGMN: A Contrastive Graph Matching Network for Self-Supervised Graph Similarity Learning [65.1042892570989]
We propose a contrastive graph matching network (CGMN) for self-supervised graph similarity learning. We employ two strategies, namely cross-view interaction and cross-graph interaction, for effective node representation learning. We transform node representations into graph-level representations via pooling operations for graph similarity computation.
arXiv Detail & Related papers (2022-05-30T13:20:26Z)
Edge but not Least: Cross-View Graph Pooling [76.71497833616024]
This paper presents a cross-view graph pooling (Co-Pooling) method to better exploit crucial graph structure information. Through cross-view interaction, edge-view pooling and node-view pooling seamlessly reinforce each other to learn more informative graph-level representations.
arXiv Detail & Related papers (2021-09-24T08:01:23Z)
Joint Graph Learning and Matching for Semantic Feature Correspondence [69.71998282148762]
We propose a joint emphgraph learning and matching network, named GLAM, to explore reliable graph structures for boosting graph matching. The proposed method is evaluated on three popular visual matching benchmarks (Pascal VOC, Willow Object and SPair-71k) It outperforms previous state-of-the-art graph matching methods by significant margins on all benchmarks.
arXiv Detail & Related papers (2021-09-01T08:24:02Z)
Effective and Efficient Graph Learning for Multi-view Clustering [173.8313827799077]
We propose an effective and efficient graph learning model for multi-view clustering. Our method exploits the view-similar between graphs of different views by the minimization of tensor Schatten p-norm. Our proposed algorithm is time-economical and obtains the stable results and scales well with the data size.
arXiv Detail & Related papers (2021-08-15T13:14:28Z)
Grale: Designing Networks for Graph Learning [68.23038997141381]
We present Grale, a scalable method we have developed to address the problem of graph design for graphs with billions of nodes. Grale operates by fusing together different measures of(potentially weak) similarity to create a graph which exhibits high task-specific homophily between its nodes. We have deployed Grale in more than 20 different industrial settings at Google, including datasets which have tens of billions of nodes, and hundreds of trillions of potential edges to score.
arXiv Detail & Related papers (2020-07-23T13:25:36Z)
Graph topology inference benchmarks for machine learning [16.857405938139525]
We introduce several benchmarks specifically designed to reveal the relative merits and limitations of graph inference methods. We also contrast some of the most prominent techniques in the literature.
arXiv Detail & Related papers (2020-07-16T09:40:32Z)
Multilevel Graph Matching Networks for Deep Graph Similarity Learning [79.3213351477689]
We propose a multi-level graph matching network (MGMN) framework for computing the graph similarity between any pair of graph-structured objects. To compensate for the lack of standard benchmark datasets, we have created and collected a set of datasets for both the graph-graph classification and graph-graph regression tasks. Comprehensive experiments demonstrate that MGMN consistently outperforms state-of-the-art baseline models on both the graph-graph classification and graph-graph regression tasks.
arXiv Detail & Related papers (2020-07-08T19:48:19Z)
Wasserstein Embedding for Graph Learning [33.90471037116372]
Wasserstein Embedding for Graph Learning (WEGL) is a framework for embedding entire graphs in a vector space. We leverage new insights on defining similarity between graphs as a function of the similarity between their node embedding distributions. We evaluate our new graph embedding approach on various benchmark graph-property prediction tasks.
arXiv Detail & Related papers (2020-06-16T18:23:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.