Towards Quantized Model Parallelism for Graph-Augmented MLPs Based on
Gradient-Free ADMM framework
- URL: http://arxiv.org/abs/2105.09837v1
- Date: Thu, 20 May 2021 15:37:42 GMT
- Title: Towards Quantized Model Parallelism for Graph-Augmented MLPs Based on
Gradient-Free ADMM framework
- Authors: Junxiang Wang, Hongyi Li, Zheng Chai, Yongchao Wang, Yue Cheng and
Liang Zhao
- Abstract summary: Graph Augmented Multi-layer Perceptron (GA-MLP) model is an attractive alternative to Graph Neural Networks (GNNs)
This is because it is resistant to the over-smoothing problem, and deeper GA-MLP models can yield better performance.
In this paper, we propose a deep learning Alternating Direction Method of Multipliers (pdADMM) framework to achieve model parallelism.
- Score: 22.5155416051303
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Graph Augmented Multi-layer Perceptron (GA-MLP) model is an attractive
alternative to Graph Neural Networks (GNNs). This is because it is resistant to
the over-smoothing problem, and deeper GA-MLP models yield better performance.
GA-MLP models are traditionally optimized by the Stochastic Gradient Descent
(SGD). However, SGD suffers from the layer dependency problem, which prevents
the gradients of different layers of GA-MLP models from being calculated in
parallel. In this paper, we propose a parallel deep learning Alternating
Direction Method of Multipliers (pdADMM) framework to achieve model
parallelism: parameters in each layer of GA-MLP models can be updated in
parallel. The extended pdADMM-Q algorithm reduces communication cost by
utilizing the quantization technique. Theoretical convergence to a critical
point of the pdADMM algorithm and the pdADMM-Q algorithm is provided with a
sublinear convergence rate $o(1/k)$. Extensive experiments in six benchmark
datasets demonstrate that the pdADMM can lead to high speedup, and outperforms
all the existing state-of-the-art comparison methods.
Related papers
- Preconditioned Inexact Stochastic ADMM for Deep Model [35.37705488695026]
This paper develops an algorithm, PISA, which enables scalable parallel computing and supports various second-moment schemes.
Grounded in rigorous theoretical guarantees, the algorithm converges under the sole assumption of Lipschitz of the gradient.
Comprehensive experimental evaluations for or fine-tuning diverse FMs, including vision models, large language models, reinforcement learning models, generative adversarial networks, and recurrent neural networks, demonstrate its superior numerical performance compared to various state-of-the-art Directions.
arXiv Detail & Related papers (2025-02-15T12:28:51Z) - Pushing the Limits of Large Language Model Quantization via the Linearity Theorem [71.3332971315821]
We present a "line theoremarity" establishing a direct relationship between the layer-wise $ell$ reconstruction error and the model perplexity increase due to quantization.
This insight enables two novel applications: (1) a simple data-free LLM quantization method using Hadamard rotations and MSE-optimal grids, dubbed HIGGS, and (2) an optimal solution to the problem of finding non-uniform per-layer quantization levels.
arXiv Detail & Related papers (2024-11-26T15:35:44Z) - Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Model-Based Reparameterization Policy Gradient Methods: Theory and
Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics.
Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes.
We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - From graphs to DAGs: a low-complexity model and a scalable algorithm [0.0]
This paper presents a low-complexity model, called LoRAM for Low-Rank Additive Model, which combines low-rank matrix factorization with a sparsification mechanism for the continuous optimization of DAGs.
The proposed approach achieves a reduction from a cubic complexity to quadratic complexity while handling the same DAG characteristic function as NoTears.
arXiv Detail & Related papers (2022-04-10T10:22:56Z) - A new perspective on probabilistic image modeling [92.89846887298852]
We present a new probabilistic approach for image modeling capable of density estimation, sampling and tractable inference.
DCGMMs can be trained end-to-end by SGD from random initial conditions, much like CNNs.
We show that DCGMMs compare favorably to several recent PC and SPN models in terms of inference, classification and sampling.
arXiv Detail & Related papers (2022-03-21T14:53:57Z) - Mixed Policy Gradient: off-policy reinforcement learning driven jointly
by data and model [32.61834127169759]
Reinforcement learning (RL) shows great potential in sequential decision-making.
Mainstream RL algorithms are data-driven, which usually yield better performance but much slower convergence compared with model-driven methods.
This paper proposes mixed policy gradient (MPG) algorithm, which fuses the empirical data and the transition model in policy gradient (PG) to accelerate convergence without performance.
arXiv Detail & Related papers (2021-02-23T06:05:17Z) - Multi-Fidelity High-Order Gaussian Processes for Physical Simulation [24.033468062984458]
High-fidelity partial differential equations (PDEs) are more expensive than low-fidelity ones.
We propose Multi-Fidelity High-Order Gaussian Process (MFHoGP) that can capture complex correlations.
MFHoGP propagates bases throughout fidelities to fuse information, and places a deep matrix GP prior over the basis weights.
arXiv Detail & Related papers (2020-06-08T22:31:59Z) - Dual Stochastic Natural Gradient Descent and convergence of interior
half-space gradient approximations [0.0]
Multinomial logistic regression (MLR) is widely used in statistics and machine learning.
gradient descent (SGD) is the most common approach for determining the parameters of a MLR model in big data scenarios.
arXiv Detail & Related papers (2020-01-19T00:53:49Z) - Q-GADMM: Quantized Group ADMM for Communication Efficient Decentralized Machine Learning [66.18202188565922]
We propose a communication-efficient decentralized machine learning (ML) algorithm, coined QGADMM (QGADMM)
We develop a novel quantization method to adaptively adjust modelization levels and their probabilities, while proving the convergence of QGADMM for convex functions.
arXiv Detail & Related papers (2019-10-23T10:47:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.