Towards Quantized Model Parallelism for Graph-Augmented MLPs Based on
Gradient-Free ADMM framework
- URL: http://arxiv.org/abs/2105.09837v1
- Date: Thu, 20 May 2021 15:37:42 GMT
- Title: Towards Quantized Model Parallelism for Graph-Augmented MLPs Based on
Gradient-Free ADMM framework
- Authors: Junxiang Wang, Hongyi Li, Zheng Chai, Yongchao Wang, Yue Cheng and
Liang Zhao
- Abstract summary: Graph Augmented Multi-layer Perceptron (GA-MLP) model is an attractive alternative to Graph Neural Networks (GNNs)
This is because it is resistant to the over-smoothing problem, and deeper GA-MLP models can yield better performance.
In this paper, we propose a deep learning Alternating Direction Method of Multipliers (pdADMM) framework to achieve model parallelism.
- Score: 22.5155416051303
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Graph Augmented Multi-layer Perceptron (GA-MLP) model is an attractive
alternative to Graph Neural Networks (GNNs). This is because it is resistant to
the over-smoothing problem, and deeper GA-MLP models yield better performance.
GA-MLP models are traditionally optimized by the Stochastic Gradient Descent
(SGD). However, SGD suffers from the layer dependency problem, which prevents
the gradients of different layers of GA-MLP models from being calculated in
parallel. In this paper, we propose a parallel deep learning Alternating
Direction Method of Multipliers (pdADMM) framework to achieve model
parallelism: parameters in each layer of GA-MLP models can be updated in
parallel. The extended pdADMM-Q algorithm reduces communication cost by
utilizing the quantization technique. Theoretical convergence to a critical
point of the pdADMM algorithm and the pdADMM-Q algorithm is provided with a
sublinear convergence rate $o(1/k)$. Extensive experiments in six benchmark
datasets demonstrate that the pdADMM can lead to high speedup, and outperforms
all the existing state-of-the-art comparison methods.
Related papers
- Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Model-Based Reparameterization Policy Gradient Methods: Theory and
Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics.
Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes.
We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - From graphs to DAGs: a low-complexity model and a scalable algorithm [0.0]
This paper presents a low-complexity model, called LoRAM for Low-Rank Additive Model, which combines low-rank matrix factorization with a sparsification mechanism for the continuous optimization of DAGs.
The proposed approach achieves a reduction from a cubic complexity to quadratic complexity while handling the same DAG characteristic function as NoTears.
arXiv Detail & Related papers (2022-04-10T10:22:56Z) - A new perspective on probabilistic image modeling [92.89846887298852]
We present a new probabilistic approach for image modeling capable of density estimation, sampling and tractable inference.
DCGMMs can be trained end-to-end by SGD from random initial conditions, much like CNNs.
We show that DCGMMs compare favorably to several recent PC and SPN models in terms of inference, classification and sampling.
arXiv Detail & Related papers (2022-03-21T14:53:57Z) - Mixed Policy Gradient: off-policy reinforcement learning driven jointly
by data and model [32.61834127169759]
Reinforcement learning (RL) shows great potential in sequential decision-making.
Mainstream RL algorithms are data-driven, which usually yield better performance but much slower convergence compared with model-driven methods.
This paper proposes mixed policy gradient (MPG) algorithm, which fuses the empirical data and the transition model in policy gradient (PG) to accelerate convergence without performance.
arXiv Detail & Related papers (2021-02-23T06:05:17Z) - Probabilistic Circuits for Variational Inference in Discrete Graphical
Models [101.28528515775842]
Inference in discrete graphical models with variational methods is difficult.
Many sampling-based methods have been proposed for estimating Evidence Lower Bound (ELBO)
We propose a new approach that leverages the tractability of probabilistic circuit models, such as Sum Product Networks (SPN)
We show that selective-SPNs are suitable as an expressive variational distribution, and prove that when the log-density of the target model is aweighted the corresponding ELBO can be computed analytically.
arXiv Detail & Related papers (2020-10-22T05:04:38Z) - An EM Approach to Non-autoregressive Conditional Sequence Generation [49.11858479436565]
Autoregressive (AR) models have been the dominating approach to conditional sequence generation.
Non-autoregressive (NAR) models have been recently proposed to reduce the latency by generating all output tokens in parallel.
This paper proposes a new approach that jointly optimize both AR and NAR models in a unified Expectation-Maximization framework.
arXiv Detail & Related papers (2020-06-29T20:58:57Z) - Multi-Fidelity High-Order Gaussian Processes for Physical Simulation [24.033468062984458]
High-fidelity partial differential equations (PDEs) are more expensive than low-fidelity ones.
We propose Multi-Fidelity High-Order Gaussian Process (MFHoGP) that can capture complex correlations.
MFHoGP propagates bases throughout fidelities to fuse information, and places a deep matrix GP prior over the basis weights.
arXiv Detail & Related papers (2020-06-08T22:31:59Z) - Dual Stochastic Natural Gradient Descent and convergence of interior
half-space gradient approximations [0.0]
Multinomial logistic regression (MLR) is widely used in statistics and machine learning.
gradient descent (SGD) is the most common approach for determining the parameters of a MLR model in big data scenarios.
arXiv Detail & Related papers (2020-01-19T00:53:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.