The Effects of Randomness on the Stability of Node Embeddings
- URL: http://arxiv.org/abs/2005.10039v1
- Date: Wed, 20 May 2020 13:36:09 GMT
- Title: The Effects of Randomness on the Stability of Node Embeddings
- Authors: Tobias Schumacher, Hinrikus Wolf, Martin Ritzert, Florian Lemmerich,
Jan Bachmann, Florian Frantzen, Max Klabunde, Martin Grohe, Markus Strohmaier
- Abstract summary: We apply five node embeddings algorithms to graphs and evaluate their stability under randomness.
We find significant instabilities in the geometry of embedding spaces independent of the centrality of a node.
This suggests that instability effects need to be taken into account when working with node embeddings.
- Score: 5.126380982787905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We systematically evaluate the (in-)stability of state-of-the-art node
embedding algorithms due to randomness, i.e., the random variation of their
outcomes given identical algorithms and graphs. We apply five node embeddings
algorithms---HOPE, LINE, node2vec, SDNE, and GraphSAGE---to synthetic and
empirical graphs and assess their stability under randomness with respect to
(i) the geometry of embedding spaces as well as (ii) their performance in
downstream tasks. We find significant instabilities in the geometry of
embedding spaces independent of the centrality of a node. In the evaluation of
downstream tasks, we find that the accuracy of node classification seems to be
unaffected by random seeding while the actual classification of nodes can vary
significantly. This suggests that instability effects need to be taken into
account when working with node embeddings. Our work is relevant for researchers
and engineers interested in the effectiveness, reliability, and reproducibility
of node embedding approaches.
Related papers
- SimCalib: Graph Neural Network Calibration based on Similarity between
Nodes [60.92081159963772]
Graph neural networks (GNNs) have exhibited impressive performance in modeling graph data as exemplified in various applications.
We shed light on the relationship between GNN calibration and nodewise similarity via theoretical analysis.
A novel calibration framework, named SimCalib, is accordingly proposed to consider similarity between nodes at global and local levels.
arXiv Detail & Related papers (2023-12-19T04:58:37Z) - Uncertainty in GNN Learning Evaluations: A Comparison Between Measures
for Quantifying Randomness in GNN Community Detection [4.358468367889626]
Real-world benchmarks are perplexing due to the multitude of decisions influencing GNN evaluations.
$W$ Randomness coefficient, based on the Wasserstein distance, is identified as providing the most robust assessment of randomness.
arXiv Detail & Related papers (2023-12-14T15:06:29Z) - A Topological Perspective on Demystifying GNN-Based Link Prediction
Performance [72.06314265776683]
Topological Concentration (TC) is based on the intersection of the local subgraph of each node with the ones of its neighbors.
We show that TC has a higher correlation with LP performance than other node-level topological metrics like degree and subgraph density.
We propose Approximated Topological Concentration (ATC) and theoretically/empirically justify its efficacy in approximating TC and reducing the complexity.
arXiv Detail & Related papers (2023-10-06T22:07:49Z) - NodeFormer: A Scalable Graph Structure Learning Transformer for Node
Classification [70.51126383984555]
We introduce a novel all-pair message passing scheme for efficiently propagating node signals between arbitrary nodes.
The efficient computation is enabled by a kernerlized Gumbel-Softmax operator.
Experiments demonstrate the promising efficacy of the method in various tasks including node classification on graphs.
arXiv Detail & Related papers (2023-06-14T09:21:15Z) - A Systematic Evaluation of Node Embedding Robustness [77.29026280120277]
We assess the empirical robustness of node embedding models to random and adversarial poisoning attacks.
We compare edge addition, deletion and rewiring strategies computed using network properties as well as node labels.
We found that node classification suffers from higher performance degradation as opposed to network reconstruction.
arXiv Detail & Related papers (2022-09-16T17:20:23Z) - Stability of Aggregation Graph Neural Networks [153.70485149740608]
We study the stability properties of aggregation graph neural networks (Agg-GNNs) considering perturbations of the underlying graph.
We prove that the stability bounds are defined by the properties of the filters in the first layer of the CNN that acts on each node.
We also conclude that in Agg-GNNs the selectivity of the mapping operators is tied to the properties of the filters only in the first layer of the CNN stage.
arXiv Detail & Related papers (2022-07-08T03:54:52Z) - On the Prediction Instability of Graph Neural Networks [2.3605348648054463]
Instability of trained models can affect reliability, reliability, and trust in machine learning systems.
We systematically assess the prediction instability of node classification with state-of-the-art Graph Neural Networks (GNNs)
We find that up to one third of the incorrectly classified nodes differ across algorithm runs.
arXiv Detail & Related papers (2022-05-20T10:32:59Z) - Ergodic Limits, Relaxations, and Geometric Properties of Random Walk
Node Embeddings [11.549910517683085]
Random walk based node embedding algorithms learn vector representations of nodes by optimizing an objective function of node embedding vectors and skip-bigram statistics computed from random walks on the network.
This paper studies properties of random walk based node embeddings in the unsupervised setting of discovering hidden block structure in the network.
arXiv Detail & Related papers (2021-09-09T19:24:35Z) - Unveiling Anomalous Edges and Nominal Connectivity of Attributed
Networks [53.56901624204265]
The present work deals with uncovering anomalous edges in attributed graphs using two distinct formulations with complementary strengths.
The first relies on decomposing the graph data matrix into low rank plus sparse components to improve markedly performance.
The second broadens the scope of the first by performing robust recovery of the unperturbed graph, which enhances the anomaly identification performance.
arXiv Detail & Related papers (2021-04-17T20:00:40Z) - Consistency of random-walk based network embedding algorithms [13.214230533788932]
We study the node2vec and DeepWalk algorithms through the perspective of matrix factorization.
Our results indicate a subtle interplay between the sparsity of the observed networks, the window sizes of the random walks, and the convergence rates of the node2vec/DeepWalk embedding.
arXiv Detail & Related papers (2021-01-18T22:49:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.