Related papers: Influence-Based Mini-Batching for Graph Neural Networks

Influence-Based Mini-Batching for Graph Neural Networks

URL: http://arxiv.org/abs/2212.09083v1
Date: Sun, 18 Dec 2022 13:27:01 GMT
Title: Influence-Based Mini-Batching for Graph Neural Networks
Authors: Johannes Gasteiger, Chendi Qian, Stephan G\"unnemann
Abstract summary: We propose influence-based mini-batching for graph neural networks. IBMB accelerates inference by up to 130x compared to previous methods. This results in up to 18x faster training per epoch and up to 17x faster convergence per runtime compared to previous methods.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Using graph neural networks for large graphs is challenging since there is no clear way of constructing mini-batches. To solve this, previous methods have relied on sampling or graph clustering. While these approaches often lead to good training convergence, they introduce significant overhead due to expensive random data accesses and perform poorly during inference. In this work we instead focus on model behavior during inference. We theoretically model batch construction via maximizing the influence score of nodes on the outputs. This formulation leads to optimal approximation of the output when we do not have knowledge of the trained model. We call the resulting method influence-based mini-batching (IBMB). IBMB accelerates inference by up to 130x compared to previous methods that reach similar accuracy. Remarkably, with adaptive optimization and the right training schedule IBMB can also substantially accelerate training, thanks to precomputed batches and consecutive memory accesses. This results in up to 18x faster training per epoch and up to 17x faster convergence per runtime compared to previous methods.

Related papers

Prior-Fitted Networks Scale to Larger Datasets When Treated as Weak Learners [82.72552644267724]
BoostPFN can outperform standard PFNs with the same size of training samples in large datasets. High performance is maintained for up to 50x of the pre-training size of PFNs.
arXiv Detail & Related papers (2025-03-03T07:31:40Z)
Towards Scalable and Deep Graph Neural Networks via Noise Masking [59.058558158296265]
Graph Neural Networks (GNNs) have achieved remarkable success in many graph mining tasks. scaling them to large graphs is challenging due to the high computational and storage costs. We present random walk with noise masking (RMask), a plug-and-play module compatible with the existing model-simplification works.
arXiv Detail & Related papers (2024-12-19T07:48:14Z)
Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
We propose an algorithm that enables fast and high-quality generation under arbitrary constraints. During inference, we can interchange between gradient updates computed on the noisy image and updates computed on the final, clean image. Our approach produces results that rival or surpass the state-of-the-art training-free inference approaches.
arXiv Detail & Related papers (2024-10-24T14:52:38Z)
Truncated Consistency Models [57.50243901368328]
Training consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints. We empirically find that this training paradigm limits the one-step generation performance of consistency models. We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution.
arXiv Detail & Related papers (2024-10-18T22:38:08Z)
CDFGNN: a Systematic Design of Cache-based Distributed Full-Batch Graph Neural Network Training with Communication Reduction [7.048300785744331]
Graph neural network training is mainly categorized into mini-batch and full-batch training methods. In the distributed cluster, frequent remote accesses of features and gradients lead to huge communication overhead. We introduce the cached-based distributed full-batch graph neural network training framework (CDFGNN) Our results indicate that CDFGNN has great potential in accelerating distributed full-batch GNN training tasks.
arXiv Detail & Related papers (2024-08-01T01:57:09Z)
Optimizing Large Model Training through Overlapped Activation Recomputation [24.28543166026873]
We present Lynx, a new recomputation framework to reduce overhead by overlapping recomputation with communication in training pipelines. Our comprehensive evaluation using GPT models with 1.3B-23B parameters shows that Lynx outperforms existing recomputation approaches by up to 1.37x.
arXiv Detail & Related papers (2024-06-13T02:31:36Z)
Boosting Low-Data Instance Segmentation by Unsupervised Pre-training with Saliency Prompt [103.58323875748427]
This work offers a novel unsupervised pre-training solution for low-data regimes. Inspired by the recent success of the Prompting technique, we introduce a new pre-training method that boosts QEIS models. Experimental results show that our method significantly boosts several QEIS models on three datasets.
arXiv Detail & Related papers (2023-02-02T15:49:03Z)
Prior-mean-assisted Bayesian optimization application on FRIB Front-End tunning [61.78406085010957]
We exploit a neural network model trained over historical data as a prior mean of BO for FRIB Front-End tuning. In this paper, we exploit a neural network model trained over historical data as a prior mean of BO for FRIB Front-End tuning.
arXiv Detail & Related papers (2022-11-11T18:34:15Z)
Towards Sparsification of Graph Neural Networks [9.568566305616656]
We use two state-of-the-art model compression methods to train and prune and sparse training for the sparsification of weight layers in GNNs. We evaluate and compare the efficiency of both methods in terms of accuracy, training sparsity, and training FLOPs on real-world graphs.
arXiv Detail & Related papers (2022-09-11T01:39:29Z)
Simpler is Better: off-the-shelf Continual Learning Through Pretrained Backbones [0.0]
We propose a baseline (off-the-shelf) for Continual Learning of Computer Vision problems. We exploit the power of pretrained models to compute a class prototype and fill a memory bank. We compare our pipeline with common CNN models and show the superiority of Vision Transformers.
arXiv Detail & Related papers (2022-05-03T16:03:46Z)
Scaling Knowledge Graph Embedding Models [12.757685697180946]
We propose a new method for scaling training of knowledge graph embedding models for link prediction. Our scaling solution for GNN-based knowledge graph embedding models achieves a 16x speed up on benchmark datasets.
arXiv Detail & Related papers (2022-01-08T08:34:52Z)
Combining Label Propagation and Simple Models Out-performs Graph Neural Networks [52.121819834353865]
We show that for many standard transductive node classification benchmarks, we can exceed or match the performance of state-of-the-art GNNs. We call this overall procedure Correct and Smooth (C&S) Our approach exceeds or nearly matches the performance of state-of-the-art GNNs on a wide variety of benchmarks.
arXiv Detail & Related papers (2020-10-27T02:10:52Z)
Accurate, Efficient and Scalable Training of Graph Neural Networks [9.569918335816963]
Graph Neural Networks (GNNs) are powerful deep learning models to generate node embeddings on graphs. It is still challenging to perform training in an efficient and scalable way. We propose a novel parallel training framework that reduces training workload by orders of magnitude compared with state-of-the-art minibatch methods.
arXiv Detail & Related papers (2020-10-05T22:06:23Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
Improving Semantic Segmentation via Self-Training [75.07114899941095]
We show that we can obtain state-of-the-art results using a semi-supervised approach, specifically a self-training paradigm. We first train a teacher model on labeled data, and then generate pseudo labels on a large set of unlabeled data. Our robust training framework can digest human-annotated and pseudo labels jointly and achieve top performances on Cityscapes, CamVid and KITTI datasets.
arXiv Detail & Related papers (2020-04-30T17:09:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.