RSC: Accelerating Graph Neural Networks Training via Randomized Sparse
Computations
- URL: http://arxiv.org/abs/2210.10737v2
- Date: Sun, 2 Jul 2023 22:54:45 GMT
- Title: RSC: Accelerating Graph Neural Networks Training via Randomized Sparse
Computations
- Authors: Zirui Liu, Shengyuan Chen, Kaixiong Zhou, Daochen Zha, Xiao Huang, Xia
Hu
- Abstract summary: Training graph neural networks (GNNs) is time consuming because sparse graph-based operations are hard to be accelerated by hardware.
We explore trading off the computational precision to reduce the time complexity via sampling-based approximation.
We propose Randomized Sparse Computation, which for the first time demonstrate the potential of training GNNs with approximated operations.
- Score: 56.59168541623729
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The training of graph neural networks (GNNs) is extremely time consuming
because sparse graph-based operations are hard to be accelerated by hardware.
Prior art explores trading off the computational precision to reduce the time
complexity via sampling-based approximation. Based on the idea, previous works
successfully accelerate the dense matrix based operations (e.g., convolution
and linear) with negligible accuracy drop. However, unlike dense matrices,
sparse matrices are stored in the irregular data format such that each
row/column may have different number of non-zero entries. Thus, compared to the
dense counterpart, approximating sparse operations has two unique challenges
(1) we cannot directly control the efficiency of approximated sparse operation
since the computation is only executed on non-zero entries; (2) sub-sampling
sparse matrices is much more inefficient due to the irregular data format. To
address the issues, our key idea is to control the accuracy-efficiency trade
off by optimizing computation resource allocation layer-wisely and
epoch-wisely. Specifically, for the first challenge, we customize the
computation resource to different sparse operations, while limit the total used
resource below a certain budget. For the second challenge, we cache previous
sampled sparse matrices to reduce the epoch-wise sampling overhead. Finally, we
propose a switching mechanisms to improve the generalization of GNNs trained
with approximated operations. To this end, we propose Randomized Sparse
Computation, which for the first time demonstrate the potential of training
GNNs with approximated operations. In practice, rsc can achieve up to
$11.6\times$ speedup for a single sparse operation and a $1.6\times$ end-to-end
wall-clock time speedup with negligible accuracy drop.
Related papers
- Exploiting Symmetric Temporally Sparse BPTT for Efficient RNN Training [20.49255973077044]
This work describes a training algorithm for Delta RNNs that exploits temporal sparsity in the backward propagation phase to reduce computational requirements for training on the edge.
Results show a reduction of $sim$80% in matrix operations for training a 56k parameter Delta LSTM on the Fluent Speech Commands dataset with negligible accuracy loss.
We show that the proposed Delta RNN training will be useful for online incremental learning on edge devices with limited computing resources.
arXiv Detail & Related papers (2023-12-14T23:07:37Z) - Eva: A General Vectorized Approximation Framework for Second-order
Optimization [16.647611352181574]
We present a memory- and time-efficient second-order algorithm named Eva with two novel techniques.
We derive an efficient update formula without explicitly computing the inverse of using the Sherman-Morrison formula.
Experiments show that Eva reduces the end-to-end training time up to 2.05x and 2.42x compared to first-order SGD and second-order algorithms.
arXiv Detail & Related papers (2023-08-04T03:51:38Z) - Learning the Positions in CountSketch [49.57951567374372]
We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem.
In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries.
arXiv Detail & Related papers (2023-06-11T07:28:35Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Scalable Optimal Transport in High Dimensions for Graph Distances,
Embedding Alignment, and More [7.484063729015126]
We propose two effective log-linear time approximations of the cost matrix for optimal transport.
These approximations enable general log-linear time algorithms for entropy-regularized OT that perform well even for the complex, high-dimensional spaces.
For graph distance regression we propose the graph transport network (GTN), which combines graph neural networks (GNNs) with enhanced Sinkhorn.
arXiv Detail & Related papers (2021-07-14T17:40:08Z) - Fast and Accurate Pseudoinverse with Sparse Matrix Reordering and
Incremental Approach [4.710916891482697]
A pseudoinverse is a generalization of a matrix inverse, which has been extensively utilized in machine learning.
FastPI is a novel incremental singular value decomposition (SVD) based pseudoinverse method for sparse matrices.
We show that FastPI computes the pseudoinverse faster than other approximate methods without loss of accuracy.
arXiv Detail & Related papers (2020-11-09T07:47:10Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality
Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.