Bypassing Skip-Gram Negative Sampling: Dimension Regularization as a More Efficient Alternative for Graph Embeddings
- URL: http://arxiv.org/abs/2405.00172v2
- Date: Mon, 02 Jun 2025 17:43:55 GMT
- Title: Bypassing Skip-Gram Negative Sampling: Dimension Regularization as a More Efficient Alternative for Graph Embeddings
- Authors: David Liu, Arjun Seshadri, Tina Eliassi-Rad, Johan Ugander,
- Abstract summary: We show that when repulsion is most needed and the embeddings approach collapse, SGNS node-wise repulsion is more scalable than node operations.<n>We propose a flexible algorithm augmentation framework that improves the scalability of any existing algorithm using SGNS.
- Score: 8.858596502294471
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A wide range of graph embedding objectives decompose into two components: one that enforces similarity, attracting the embeddings of nodes that are perceived as similar, and another that enforces dissimilarity, repelling the embeddings of nodes that are perceived as dissimilar. Without repulsion, the embeddings would collapse into trivial solutions. Skip-Gram Negative Sampling (SGNS) is a popular and efficient repulsion approach that prevents collapse by repelling each node from a sample of dissimilar nodes. In this work, we show that when repulsion is most needed and the embeddings approach collapse, SGNS node-wise repulsion is, in the aggregate, an approximate re-centering of the node embedding dimensions. Such dimension operations are more scalable than node operations and produce a simpler geometric interpretation of the repulsion. Our theoretical result establishes dimension regularization as an effective and more efficient, compared to skip-gram node contrast, approach to enforcing dissimilarity among embeddings of nodes. We use this result to propose a flexible algorithm augmentation framework that improves the scalability of any existing algorithm using SGNS. The framework prioritizes node attraction and replaces SGNS with dimension regularization. We instantiate this generic framework for LINE and node2vec and show that the augmented algorithms preserve downstream link-prediction performance while reducing GPU memory usage by up to 33.3% and training time by 23.4%. Moreover, we show that completely removing repulsion (a special case of our augmentation framework) in LINE reduces training time by 70.9% on average, while increasing link prediction performance, especially for graphs that are globally sparse but locally dense. In general, however, repulsion is needed, and dimension regularization provides an efficient alternative to SGNS.
Related papers
- Sparse Decomposition of Graph Neural Networks [20.768412002413843]
We propose an approach to reduce the number of nodes that are included during aggregation.
We achieve this through a sparse decomposition, learning to approximate node representations using a weighted sum of linearly transformed features.
We demonstrate via extensive experiments that our method outperforms other baselines designed for inference speedup.
arXiv Detail & Related papers (2024-10-25T17:52:16Z) - Combining Optimal Transport and Embedding-Based Approaches for More Expressiveness in Unsupervised Graph Alignment [19.145556156889064]
Unsupervised graph alignment finds the one-to-one node correspondence between a pair of attributed graphs by only exploiting graph structure and node features.
We propose a principled approach to combine their advantages motivated by theoretical analysis of model expressiveness.
We are the first to guarantee the one-to-one matching constraint by reducing the problem to maximum weight matching.
arXiv Detail & Related papers (2024-06-19T04:57:35Z) - Subspace Node Pruning [2.3125457626961263]
nodes pruning is the art of removing computational units such as neurons, filters, attention heads, or even entire layers to significantly reduce inference time while retaining network performance.
In this work, we propose the projection of unit activations to an orthogonal subspace in which there is no redundant activity and within which we may prune nodes while simultaneously recovering the impact of lost units.
Our proposed method reaches state of the art when pruning ImageNet trained VGG-16 and rivals more complex state of the art methods when pruning ResNet-50 networks.
arXiv Detail & Related papers (2024-05-26T14:27:26Z) - Generative Semi-supervised Graph Anomaly Detection [42.02691404704764]
This work considers a practical semi-supervised graph anomaly detection (GAD) scenario, where part of the nodes in a graph are known to be normal.
We propose a novel Generative GAD approach (namely GGAD) for the semi-supervised scenario to better exploit the normal nodes.
GGAD is designed to leverage two important priors about the anomaly nodes -- asymmetric local affinity and egocentric closeness.
arXiv Detail & Related papers (2024-02-19T06:55:50Z) - Disentangled Condensation for Large-scale Graphs [29.384060761810172]
Graph condensation has emerged as an intriguing technique to save the expensive training costs of Graph Neural Networks (GNNs)<n>We propose to disentangle the condensation process into a two-stage GNN-free paradigm, independently condensing nodes and generating edges.<n>This simple yet effective approach achieves at least 10 times faster than state-of-the-art methods with comparable accuracy on medium-scale graphs.
arXiv Detail & Related papers (2024-01-18T09:59:00Z) - Degree-based stratification of nodes in Graph Neural Networks [66.17149106033126]
We modify the Graph Neural Network (GNN) architecture so that the weight matrices are learned, separately, for the nodes in each group.
This simple-to-implement modification seems to improve performance across datasets and GNN methods.
arXiv Detail & Related papers (2023-12-16T14:09:23Z) - Latent Graph Inference with Limited Supervision [58.54674649232757]
Latent graph inference (LGI) aims to jointly learn the underlying graph structure and node representations from data features.
Existing LGI methods commonly suffer from the issue of supervision starvation, where massive edge weights are learned without semantic supervision and do not contribute to the training loss.
In this paper, we observe that this issue is actually caused by the graph sparsification operation, which severely destroys the important connections established between pivotal nodes and labeled ones.
arXiv Detail & Related papers (2023-10-06T15:22:40Z) - T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining.
Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z) - Node Embedding for Homophilous Graphs with ARGEW: Augmentation of Random
walks by Graph Edge Weights [2.2935273605606494]
ARGEW is a novel augmentation method for random walks that expands the corpus in such a way that nodes with larger edge weights end up with closer embeddings.
With several real-world networks, we demonstrate that with ARGEW, compared to not using it, the desired pattern that node pairs with larger edge weights have closer embeddings is much clearer.
arXiv Detail & Related papers (2023-08-11T06:19:23Z) - BOURNE: Bootstrapped Self-supervised Learning Framework for Unified
Graph Anomaly Detection [50.26074811655596]
We propose a novel unified graph anomaly detection framework based on bootstrapped self-supervised learning (named BOURNE)
By swapping the context embeddings between nodes and edges, we enable the mutual detection of node and edge anomalies.
BOURNE can eliminate the need for negative sampling, thereby enhancing its efficiency in handling large graphs.
arXiv Detail & Related papers (2023-07-28T00:44:57Z) - NodeFormer: A Scalable Graph Structure Learning Transformer for Node
Classification [70.51126383984555]
We introduce a novel all-pair message passing scheme for efficiently propagating node signals between arbitrary nodes.
The efficient computation is enabled by a kernerlized Gumbel-Softmax operator.
Experiments demonstrate the promising efficacy of the method in various tasks including node classification on graphs.
arXiv Detail & Related papers (2023-06-14T09:21:15Z) - Mixed Graph Contrastive Network for Semi-Supervised Node Classification [63.924129159538076]
We propose a novel graph contrastive learning method, termed Mixed Graph Contrastive Network (MGCN)
In our method, we improve the discriminative capability of the latent embeddings by an unperturbed augmentation strategy and a correlation reduction mechanism.
By combining the two settings, we extract rich supervision information from both the abundant nodes and the rare yet valuable labeled nodes for discriminative representation learning.
arXiv Detail & Related papers (2022-06-06T14:26:34Z) - Geometric Graph Representation Learning via Maximizing Rate Reduction [73.6044873825311]
Learning node representations benefits various downstream tasks in graph analysis such as community detection and node classification.
We propose Geometric Graph Representation Learning (G2R) to learn node representations in an unsupervised manner.
G2R maps nodes in distinct groups into different subspaces, while each subspace is compact and different subspaces are dispersed.
arXiv Detail & Related papers (2022-02-13T07:46:24Z) - Graph Neural Networks with Feature and Structure Aware Random Walk [7.143879014059894]
We show that in typical heterphilous graphs, the edges may be directed, and whether to treat the edges as is or simply make them undirected greatly affects the performance of the GNN models.
We develop a model that adaptively learns the directionality of the graph, and exploits the underlying long-distance correlations between nodes.
arXiv Detail & Related papers (2021-11-19T08:54:21Z) - Integrating Network Embedding and Community Outlier Detection via
Multiclass Graph Description [15.679313861083239]
We propose a novel unsupervised graph embedding approach (called DMGD) which integrates outlier and community detection with node embedding.
We show the theoretical bounds on the number of outliers detected by DMGD.
Our formulation boils down to an interesting minimax game between the outliers, community assignments and the node embedding function.
arXiv Detail & Related papers (2020-07-20T16:21:07Z) - Sequential Graph Convolutional Network for Active Learning [53.99104862192055]
We propose a novel pool-based Active Learning framework constructed on a sequential Graph Convolution Network (GCN)
With a small number of randomly sampled images as seed labelled examples, we learn the parameters of the graph to distinguish labelled vs unlabelled nodes.
We exploit these characteristics of GCN to select the unlabelled examples which are sufficiently different from labelled ones.
arXiv Detail & Related papers (2020-06-18T00:55:10Z) - Towards Deeper Graph Neural Networks with Differentiable Group
Normalization [61.20639338417576]
Graph neural networks (GNNs) learn the representation of a node by aggregating its neighbors.
Over-smoothing is one of the key issues which limit the performance of GNNs as the number of layers increases.
We introduce two over-smoothing metrics and a novel technique, i.e., differentiable group normalization (DGN)
arXiv Detail & Related papers (2020-06-12T07:18:02Z) - Graph Highway Networks [77.38665506495553]
Graph Convolution Networks (GCN) are widely used in learning graph representations due to their effectiveness and efficiency.
They suffer from the notorious over-smoothing problem, in which the learned representations converge to alike vectors when many layers are stacked.
We propose Graph Highway Networks (GHNet) which utilize gating units to balance the trade-off between homogeneity and heterogeneity in the GCN learning process.
arXiv Detail & Related papers (2020-04-09T16:26:43Z) - Hierarchical Representation Learning in Graph Neural Networks with Node Decimation Pooling [31.812988573924674]
In graph neural networks (GNNs), pooling operators compute local summaries of input graphs to capture their global properties.
We propose the Node Decimation Pooling (NDP), a pooling operator for GNNs that generates coarser graphs while preserving the overall graph topology.
NDP is more efficient compared to state-of-the-art graph pooling operators while reaching, at the same time, competitive performance on a significant variety of graph classification tasks.
arXiv Detail & Related papers (2019-10-24T21:42:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.