Learning Stochastic Graph Neural Networks with Constrained Variance
- URL: http://arxiv.org/abs/2201.12611v1
- Date: Sat, 29 Jan 2022 15:55:58 GMT
- Title: Learning Stochastic Graph Neural Networks with Constrained Variance
- Authors: Zhan Gao and Elvin Isufi
- Abstract summary: graph neural networks (SGNNs) are information processing architectures that learn representations from data over random graphs.
We propose a variance-constrained optimization problem for SGNNs, balancing the expected performance and the deviation.
An alternating gradient-dual learning procedure is undertaken that solves the problem by updating the SGNN parameters with descent and the dual variable with ascent.
- Score: 18.32587282139282
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic graph neural networks (SGNNs) are information processing
architectures that learn representations from data over random graphs. SGNNs
are trained with respect to the expected performance, which comes with no
guarantee about deviations of particular output realizations around the optimal
expectation. To overcome this issue, we propose a variance-constrained
optimization problem for SGNNs, balancing the expected performance and the
stochastic deviation. An alternating primal-dual learning procedure is
undertaken that solves the problem by updating the SGNN parameters with
gradient descent and the dual variable with gradient ascent. To characterize
the explicit effect of the variance-constrained learning, we conduct a
theoretical analysis on the variance of the SGNN output and identify a
trade-off between the stochastic robustness and the discrimination power. We
further analyze the duality gap of the variance-constrained optimization
problem and the converging behavior of the primal-dual learning procedure. The
former indicates the optimality loss induced by the dual transformation and the
latter characterizes the limiting error of the iterative algorithm, both of
which guarantee the performance of the variance-constrained learning. Through
numerical simulations, we corroborate our theoretical findings and observe a
strong expected performance with a controllable standard deviation.
Related papers
- Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - On the Convergence Analysis of Over-Parameterized Variational Autoencoders: A Neural Tangent Kernel Perspective [7.580900499231056]
Variational Auto-Encoders (VAEs) have emerged as powerful probabilistic models for generative tasks.
This paper provides a mathematical proof of VAE under mild assumptions.
We also establish a novel connection between the optimization problem faced by over-Eized SNNs and the Kernel Ridge (KRR) problem.
arXiv Detail & Related papers (2024-09-09T06:10:31Z) - Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks [3.680127959836384]
implicit gradient descent (IGD) outperforms the common gradient descent (GD) in handling certain multi-scale problems.
We show that IGD converges a globally optimal solution at a linear convergence rate.
arXiv Detail & Related papers (2024-07-03T06:10:41Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - Analysis of Catastrophic Forgetting for Random Orthogonal Transformation
Tasks in the Overparameterized Regime [9.184987303791292]
We show that in permuted MNIST image classification tasks, the performance of multilayer perceptrons trained by vanilla gradient descent can be improved.
We provide a theoretical explanation of this effect by studying a qualitatively similar two-task linear regression problem.
We show that when a model is trained on the two tasks in sequence without any additional regularization, the risk gain on the first task is small.
arXiv Detail & Related papers (2022-06-01T18:04:33Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Accurate and Reliable Forecasting using Stochastic Differential
Equations [48.21369419647511]
It is critical yet challenging for deep learning models to properly characterize uncertainty that is pervasive in real-world environments.
This paper develops SDE-HNN to characterize the interaction between the predictive mean and variance of HNNs for accurate and reliable regression.
Experiments on the challenging datasets show that our method significantly outperforms the state-of-the-art baselines in terms of both predictive performance and uncertainty quantification.
arXiv Detail & Related papers (2021-03-28T04:18:11Z) - A Lagrangian Dual-based Theory-guided Deep Neural Network [0.0]
The Lagrangian dual-based TgNN (TgNN-LD) is proposed to improve the effectiveness of TgNN.
Experimental results demonstrate the superiority of the Lagrangian dual-based TgNN.
arXiv Detail & Related papers (2020-08-24T02:06:19Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.