Large-Scale Network Embedding in Apache Spark
- URL: http://arxiv.org/abs/2106.10620v1
- Date: Sun, 20 Jun 2021 04:42:24 GMT
- Title: Large-Scale Network Embedding in Apache Spark
- Authors: Wenqing Lin
- Abstract summary: We propose an efficient and effective distributed algorithm for network embedding on large graphs using Apache Spark.
We demonstrate in various experiments that our proposed approach is able to handle graphs with billions of edges within a few hours and is at least 4 times faster than the state-of-the-art approaches.
- Score: 1.3769786711365102
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Network embedding has been widely used in social recommendation and network
analysis, such as recommendation systems and anomaly detection with graphs.
However, most of previous approaches cannot handle large graphs efficiently,
due to that (i) computation on graphs is often costly and (ii) the size of
graph or the intermediate results of vectors could be prohibitively large,
rendering it difficult to be processed on a single machine. In this paper, we
propose an efficient and effective distributed algorithm for network embedding
on large graphs using Apache Spark, which recursively partitions a graph into
several small-sized subgraphs to capture the internal and external structural
information of nodes, and then computes the network embedding for each subgraph
in parallel. Finally, by aggregating the outputs on all subgraphs, we obtain
the embeddings of nodes in a linear cost. After that, we demonstrate in various
experiments that our proposed approach is able to handle graphs with billions
of edges within a few hours and is at least 4 times faster than the
state-of-the-art approaches. Besides, it achieves up to $4.25\%$ and $4.27\%$
improvements on link prediction and node classification tasks respectively. In
the end, we deploy the proposed algorithms in two online games of Tencent with
the applications of friend recommendation and item recommendation, which
improve the competitors by up to $91.11\%$ in running time and up to $12.80\%$
in the corresponding evaluation metrics.
Related papers
- T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining.
Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z) - NodeFormer: A Scalable Graph Structure Learning Transformer for Node
Classification [70.51126383984555]
We introduce a novel all-pair message passing scheme for efficiently propagating node signals between arbitrary nodes.
The efficient computation is enabled by a kernerlized Gumbel-Softmax operator.
Experiments demonstrate the promising efficacy of the method in various tasks including node classification on graphs.
arXiv Detail & Related papers (2023-06-14T09:21:15Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Optimal Propagation for Graph Neural Networks [51.08426265813481]
We propose a bi-level optimization approach for learning the optimal graph structure.
We also explore a low-rank approximation model for further reducing the time complexity.
arXiv Detail & Related papers (2022-05-06T03:37:00Z) - Neighbor2Seq: Deep Learning on Massive Graphs by Transforming Neighbors
to Sequences [55.329402218608365]
We propose the Neighbor2Seq to transform the hierarchical neighborhood of each node into a sequence.
We evaluate our method on a massive graph with more than 111 million nodes and 1.6 billion edges.
Results show that our proposed method is scalable to massive graphs and achieves superior performance across massive and medium-scale graphs.
arXiv Detail & Related papers (2022-02-07T16:38:36Z) - RaWaNet: Enriching Graph Neural Network Input via Random Walks on Graphs [0.0]
Graph neural networks (GNNs) have gained increasing popularity and have shown very promising results for data that are represented by graphs.
We propose a random walk data processing of the graphs based on three selected lengths. Namely, (regular) walks of length 1 and 2, and a fractional walk of length $gamma in (0,1)$, in order to capture the different local and global dynamics on the graphs.
We test our method on various molecular datasets by passing the processed node features to the network in order to perform several classification and regression tasks.
arXiv Detail & Related papers (2021-09-15T20:04:01Z) - Evaluating Node Embeddings of Complex Networks [0.0]
Agood embedding should capture the graph topology, node-to-node relationship, and other relevant information about the graph.
The main challenge is that one needs to make sure that embeddings describe the properties of the graphs well.
We do a series of experiments with selected graph embedding algorithms, both on real-world networks as well as artificially generated ones.
arXiv Detail & Related papers (2021-02-16T16:55:29Z) - Scaling Graph Neural Networks with Approximate PageRank [64.92311737049054]
We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs.
In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings.
We show that training PPRGo and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph.
arXiv Detail & Related papers (2020-07-03T09:30:07Z) - CoSimGNN: Towards Large-scale Graph Similarity Computation [5.17905821006887]
Graph Neural Networks (GNNs) provide a data-driven solution for this task.
Existing GNN-based methods, which either respectively embeds two graphs or deploy cross-graph interactions for whole graph pairs, are still not able to achieve competitive results.
We propose the "embedding-coarsening-matching" framework CoSimGNN, which first embeds and coarsens large graphs with adaptive pooling operation and then deploys fine-grained interactions on the coarsened graphs for final similarity scores.
arXiv Detail & Related papers (2020-05-14T16:33:13Z) - SIGN: Scalable Inception Graph Neural Networks [4.5158585619109495]
We propose a new, efficient and scalable graph deep learning architecture that sidesteps the need for graph sampling.
Our architecture allows using different local graph operators to best suit the task at hand.
We obtain state-of-the-art results on ogbn-papers100M, the largest public graph dataset, with over 110 million nodes and 1.5 billion edges.
arXiv Detail & Related papers (2020-04-23T14:46:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.