Stars: Tera-Scale Graph Building for Clustering and Graph Learning
- URL: http://arxiv.org/abs/2212.02635v1
- Date: Mon, 5 Dec 2022 22:43:26 GMT
- Title: Stars: Tera-Scale Graph Building for Clustering and Graph Learning
- Authors: CJ Carey, Jonathan Halcrow, Rajesh Jayaram, Vahab Mirrokni, Warren
Schudy, Peilin Zhong
- Abstract summary: $textitStars$ is a highly scalable method for building extremely sparse graphs via two-hop spanners.
We demonstrate that Stars builds a graph in nearly-linear time, where approximate nearest neighbors are contained within two-hop neighborhoods.
- Score: 14.265732346610534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A fundamental procedure in the analysis of massive datasets is the
construction of similarity graphs. Such graphs play a key role for many
downstream tasks, including clustering, classification, graph learning, and
nearest neighbor search. For these tasks, it is critical to build graphs which
are sparse yet still representative of the underlying data. The benefits of
sparsity are twofold: firstly, constructing dense graphs is infeasible in
practice for large datasets, and secondly, the runtime of downstream tasks is
directly influenced by the sparsity of the similarity graph. In this work, we
present $\textit{Stars}$: a highly scalable method for building extremely
sparse graphs via two-hop spanners, which are graphs where similar points are
connected by a path of length at most two. Stars can construct two-hop spanners
with significantly fewer similarity comparisons, which are a major bottleneck
for learning based models where comparisons are expensive to evaluate.
Theoretically, we demonstrate that Stars builds a graph in nearly-linear time,
where approximate nearest neighbors are contained within two-hop neighborhoods.
In practice, we have deployed Stars for multiple data sets allowing for graph
building at the $\textit{Tera-Scale}$, i.e., for graphs with tens of trillions
of edges. We evaluate the performance of Stars for clustering and graph
learning, and demonstrate 10~1000-fold improvements in pairwise similarity
comparisons compared to different baselines, and 2~10-fold improvement in
running time without quality loss.
Related papers
- Robust Graph Structure Learning under Heterophily [12.557639223778722]
We propose a novel robust graph structure learning method to achieve a high-quality graph from heterophilic data for downstream tasks.
We first apply a high-pass filter to make each node more distinctive from its neighbors by encoding structure information into the node features.
Then, we learn a robust graph with an adaptive norm characterizing different levels of noise.
arXiv Detail & Related papers (2024-03-06T12:29:13Z) - Exploiting Edge Features in Graphs with Fused Network Gromov-Wasserstein
Distance [18.522233517515975]
We introduce an extension of Gromov-Wasserstein distance for comparing graphs whose both nodes and edges have features.
We empirically show the effectiveness of the novel distance in learning tasks where graphs occur in either input space or output space.
arXiv Detail & Related papers (2023-09-28T17:05:03Z) - CGMN: A Contrastive Graph Matching Network for Self-Supervised Graph
Similarity Learning [65.1042892570989]
We propose a contrastive graph matching network (CGMN) for self-supervised graph similarity learning.
We employ two strategies, namely cross-view interaction and cross-graph interaction, for effective node representation learning.
We transform node representations into graph-level representations via pooling operations for graph similarity computation.
arXiv Detail & Related papers (2022-05-30T13:20:26Z) - Edge but not Least: Cross-View Graph Pooling [76.71497833616024]
This paper presents a cross-view graph pooling (Co-Pooling) method to better exploit crucial graph structure information.
Through cross-view interaction, edge-view pooling and node-view pooling seamlessly reinforce each other to learn more informative graph-level representations.
arXiv Detail & Related papers (2021-09-24T08:01:23Z) - Joint Graph Learning and Matching for Semantic Feature Correspondence [69.71998282148762]
We propose a joint emphgraph learning and matching network, named GLAM, to explore reliable graph structures for boosting graph matching.
The proposed method is evaluated on three popular visual matching benchmarks (Pascal VOC, Willow Object and SPair-71k)
It outperforms previous state-of-the-art graph matching methods by significant margins on all benchmarks.
arXiv Detail & Related papers (2021-09-01T08:24:02Z) - Effective and Efficient Graph Learning for Multi-view Clustering [173.8313827799077]
We propose an effective and efficient graph learning model for multi-view clustering.
Our method exploits the view-similar between graphs of different views by the minimization of tensor Schatten p-norm.
Our proposed algorithm is time-economical and obtains the stable results and scales well with the data size.
arXiv Detail & Related papers (2021-08-15T13:14:28Z) - Grale: Designing Networks for Graph Learning [68.23038997141381]
We present Grale, a scalable method we have developed to address the problem of graph design for graphs with billions of nodes.
Grale operates by fusing together different measures of(potentially weak) similarity to create a graph which exhibits high task-specific homophily between its nodes.
We have deployed Grale in more than 20 different industrial settings at Google, including datasets which have tens of billions of nodes, and hundreds of trillions of potential edges to score.
arXiv Detail & Related papers (2020-07-23T13:25:36Z) - Graph topology inference benchmarks for machine learning [16.857405938139525]
We introduce several benchmarks specifically designed to reveal the relative merits and limitations of graph inference methods.
We also contrast some of the most prominent techniques in the literature.
arXiv Detail & Related papers (2020-07-16T09:40:32Z) - Multilevel Graph Matching Networks for Deep Graph Similarity Learning [79.3213351477689]
We propose a multi-level graph matching network (MGMN) framework for computing the graph similarity between any pair of graph-structured objects.
To compensate for the lack of standard benchmark datasets, we have created and collected a set of datasets for both the graph-graph classification and graph-graph regression tasks.
Comprehensive experiments demonstrate that MGMN consistently outperforms state-of-the-art baseline models on both the graph-graph classification and graph-graph regression tasks.
arXiv Detail & Related papers (2020-07-08T19:48:19Z) - Wasserstein Embedding for Graph Learning [33.90471037116372]
Wasserstein Embedding for Graph Learning (WEGL) is a framework for embedding entire graphs in a vector space.
We leverage new insights on defining similarity between graphs as a function of the similarity between their node embedding distributions.
We evaluate our new graph embedding approach on various benchmark graph-property prediction tasks.
arXiv Detail & Related papers (2020-06-16T18:23:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.