Sparse Graph Learning with Eigen-gap for Spectral Filter Training in
Graph Convolutional Networks
- URL: http://arxiv.org/abs/2202.13526v1
- Date: Mon, 28 Feb 2022 03:39:48 GMT
- Title: Sparse Graph Learning with Eigen-gap for Spectral Filter Training in
Graph Convolutional Networks
- Authors: Jin Zeng, Saghar Bagheri, Yang Liu, Gene Cheung, Wei Hu
- Abstract summary: We show that a sparse graph Laplacian matrix $L$ closest to $barC-1$ can be used to promote a deeper graph convolutional neural net (GCN) architecture.
Experiments show that our proposal produced deeper GCNs and smaller errors compared to a competing scheme without explicit eigen-gap optimization.
- Score: 38.92746674849527
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is now known that the expressive power of graph convolutional neural nets
(GCN) does not grow infinitely with the number of layers. Instead, the GCN
output approaches a subspace spanned by the first eigenvector of the normalized
graph Laplacian matrix with the convergence rate characterized by the
"eigen-gap": the difference between the Laplacian's first two distinct
eigenvalues. To promote a deeper GCN architecture with sufficient
expressiveness, in this paper, given an empirical covariance matrix $\bar{C}$
computed from observable data, we learn a sparse graph Laplacian matrix $L$
closest to $\bar{C}^{-1}$ while maintaining a desirable eigen-gap that slows
down convergence. Specifically, we first define a sparse graph learning problem
with constraints on the first eigenvector (the most common signal) and the
eigen-gap. We solve the corresponding dual problem greedily, where a locally
optimal eigen-pair is computed one at a time via a fast approximation of a
semi-definite programming (SDP) formulation. The computed $L$ with the desired
eigen-gap is normalized spectrally and used for supervised training of GCN for
a targeted task. Experiments show that our proposal produced deeper GCNs and
smaller errors compared to a competing scheme without explicit eigen-gap
optimization.
Related papers
- Convergence Analysis of Natural Gradient Descent for Over-parameterized Physics-Informed Neural Networks [3.680127959836384]
First-order methods, such as gradient descent (GD) and quadratic descent gradient (SGD) have been proven effective in training neural networks.
However, the learning rate of GD for training two-layer neural networks exhibits poor dependence on the sample size and the Gram matrix.
In this paper, we show that for the $L2$ regression problems, the learning rate can be improved from $mathcalO(1)$ to $mathcalO(1)$, which implies that GD actually enjoys a faster convergence rate.
arXiv Detail & Related papers (2024-08-01T14:06:34Z) - Scalable Graph Compressed Convolutions [68.85227170390864]
We propose a differentiable method that applies permutations to calibrate input graphs for Euclidean convolution.
Based on the graph calibration, we propose the Compressed Convolution Network (CoCN) for hierarchical graph representation learning.
arXiv Detail & Related papers (2024-07-26T03:14:13Z) - Implicit Bias and Fast Convergence Rates for Self-attention [30.08303212679308]
Self-attention, the core mechanism of transformers, distinguishes them from traditional neural networks and drives their outstanding performance.
We investigate the implicit bias of gradient descent (GD) in training a self-attention layer with fixed linear decoder in binary.
We provide the first finite-time convergence rate for $W_t$ to $W_mm$, along with the rate of sparsification in the attention map.
arXiv Detail & Related papers (2024-02-08T15:15:09Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - One-step Bipartite Graph Cut: A Normalized Formulation and Its
Application to Scalable Subspace Clustering [56.81492360414741]
We show how to enforce a one-step normalized cut for bipartite graphs, especially with linear-time complexity.
In this paper, we first characterize a novel one-step bipartite graph cut criterion with normalized constraints, and theoretically prove its equivalence to a trace problem.
We extend this cut criterion to a scalable subspace clustering approach, where adaptive anchor learning, bipartite graph learning, and one-step normalized bipartite graph partitioning are simultaneously modeled.
arXiv Detail & Related papers (2023-05-12T11:27:20Z) - Stochastic Parallelizable Eigengap Dilation for Large Graph Clustering [12.544602297450533]
We aim to identify clusters of nodes where most edges fall within clusters and only few edges fall between clusters.
A core step of spectral clustering is performing an eigendecomposition of the corresponding graph Laplacian matrix.
This paper introduces a parallelizable approach to dilating the spectrum in order to accelerate SVD solvers and in turn, spectral clustering.
arXiv Detail & Related papers (2022-07-29T10:13:07Z) - SpecNet2: Orthogonalization-free spectral embedding by neural networks [11.688030627514532]
This paper introduces a new neural network approach, named SpecNet2, to compute spectral embedding.
SpecNet2 also allows separating the sampling of rows and columns of the graph affinity matrix.
Numerical experiments demonstrate the improved performance and computational efficiency of SpecNet2 on simulated data and image datasets.
arXiv Detail & Related papers (2022-06-14T07:19:10Z) - Hybrid Model-based / Data-driven Graph Transform for Image Coding [54.31406300524195]
We present a hybrid model-based / data-driven approach to encode an intra-prediction residual block.
The first $K$ eigenvectors of a transform matrix are derived from a statistical model, e.g., the asymmetric discrete sine transform (ADST) for stability.
Using WebP as a baseline image, experimental results show that our hybrid graph transform achieved better energy compaction than default discrete cosine transform (DCT) and better stability than KLT.
arXiv Detail & Related papers (2022-03-02T15:36:44Z) - Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via
GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer.
In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph.
Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.