Related papers: Towards Quantized Model Parallelism for Graph-Augmented MLPs Based on Gradient-Free ADMM framework

Towards Quantized Model Parallelism for Graph-Augmented MLPs Based on Gradient-Free ADMM framework

URL: http://arxiv.org/abs/2105.09837v1
Date: Thu, 20 May 2021 15:37:42 GMT
Title: Towards Quantized Model Parallelism for Graph-Augmented MLPs Based on Gradient-Free ADMM framework
Authors: Junxiang Wang, Hongyi Li, Zheng Chai, Yongchao Wang, Yue Cheng and Liang Zhao
Abstract summary: Graph Augmented Multi-layer Perceptron (GA-MLP) model is an attractive alternative to Graph Neural Networks (GNNs) This is because it is resistant to the over-smoothing problem, and deeper GA-MLP models can yield better performance. In this paper, we propose a deep learning Alternating Direction Method of Multipliers (pdADMM) framework to achieve model parallelism.
Score: 22.5155416051303
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Graph Augmented Multi-layer Perceptron (GA-MLP) model is an attractive alternative to Graph Neural Networks (GNNs). This is because it is resistant to the over-smoothing problem, and deeper GA-MLP models yield better performance. GA-MLP models are traditionally optimized by the Stochastic Gradient Descent (SGD). However, SGD suffers from the layer dependency problem, which prevents the gradients of different layers of GA-MLP models from being calculated in parallel. In this paper, we propose a parallel deep learning Alternating Direction Method of Multipliers (pdADMM) framework to achieve model parallelism: parameters in each layer of GA-MLP models can be updated in parallel. The extended pdADMM-Q algorithm reduces communication cost by utilizing the quantization technique. Theoretical convergence to a critical point of the pdADMM algorithm and the pdADMM-Q algorithm is provided with a sublinear convergence rate $o(1/k)$. Extensive experiments in six benchmark datasets demonstrate that the pdADMM can lead to high speedup, and outperforms all the existing state-of-the-art comparison methods.

Related papers

Model-Preserving Adaptive Rounding [32.52857495678025]
Yet Another Quantization Algorithm (YAQA) is an adaptive rounding algorithm that uses Kronecker-factored approximations of each linear layer's Hessian.<n>It reduces the KL divergence to the original model by $approx 30%$ while achieving state of the art performance on downstream tasks.
arXiv Detail & Related papers (2025-05-29T01:53:00Z)
Preconditioned Inexact Stochastic ADMM for Deep Model [35.37705488695026]
This paper develops an algorithm, PISA, which enables scalable parallel computing and supports various second-moment schemes. Grounded in rigorous theoretical guarantees, the algorithm converges under the sole assumption of Lipschitz of the gradient. Comprehensive experimental evaluations for or fine-tuning diverse FMs, including vision models, large language models, reinforcement learning models, generative adversarial networks, and recurrent neural networks, demonstrate its superior numerical performance compared to various state-of-the-art Directions.
arXiv Detail & Related papers (2025-02-15T12:28:51Z)
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem [71.3332971315821]
We present a "line theoremarity" establishing a direct relationship between the layer-wise $ell$ reconstruction error and the model perplexity increase due to quantization. This insight enables two novel applications: (1) a simple data-free LLM quantization method using Hadamard rotations and MSE-optimal grids, dubbed HIGGS, and (2) an optimal solution to the problem of finding non-uniform per-layer quantization levels.
arXiv Detail & Related papers (2024-11-26T15:35:44Z)
Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference. Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z)
Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics. Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes. We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z)
Quantifying the Optimization and Generalization Advantages of Graph Neural Networks Over Multilayer Perceptrons [50.33260238739837]
Graph networks (GNNs) have demonstrated remarkable capabilities in learning from graph-structured data.<n>There remains a lack of analysis comparing GNNs and generalizations from an optimization and generalization perspective.
arXiv Detail & Related papers (2023-06-24T10:21:11Z)
Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression. Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates. The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z)
From graphs to DAGs: a low-complexity model and a scalable algorithm [0.0]
This paper presents a low-complexity model, called LoRAM for Low-Rank Additive Model, which combines low-rank matrix factorization with a sparsification mechanism for the continuous optimization of DAGs. The proposed approach achieves a reduction from a cubic complexity to quadratic complexity while handling the same DAG characteristic function as NoTears.
arXiv Detail & Related papers (2022-04-10T10:22:56Z)
A new perspective on probabilistic image modeling [92.89846887298852]
We present a new probabilistic approach for image modeling capable of density estimation, sampling and tractable inference. DCGMMs can be trained end-to-end by SGD from random initial conditions, much like CNNs. We show that DCGMMs compare favorably to several recent PC and SPN models in terms of inference, classification and sampling.
arXiv Detail & Related papers (2022-03-21T14:53:57Z)
Mixed Policy Gradient: off-policy reinforcement learning driven jointly by data and model [32.61834127169759]
Reinforcement learning (RL) shows great potential in sequential decision-making. Mainstream RL algorithms are data-driven, which usually yield better performance but much slower convergence compared with model-driven methods. This paper proposes mixed policy gradient (MPG) algorithm, which fuses the empirical data and the transition model in policy gradient (PG) to accelerate convergence without performance.
arXiv Detail & Related papers (2021-02-23T06:05:17Z)
An EM Approach to Non-autoregressive Conditional Sequence Generation [49.11858479436565]
Autoregressive (AR) models have been the dominating approach to conditional sequence generation. Non-autoregressive (NAR) models have been recently proposed to reduce the latency by generating all output tokens in parallel. This paper proposes a new approach that jointly optimize both AR and NAR models in a unified Expectation-Maximization framework.
arXiv Detail & Related papers (2020-06-29T20:58:57Z)
Multi-Fidelity High-Order Gaussian Processes for Physical Simulation [24.033468062984458]
High-fidelity partial differential equations (PDEs) are more expensive than low-fidelity ones. We propose Multi-Fidelity High-Order Gaussian Process (MFHoGP) that can capture complex correlations. MFHoGP propagates bases throughout fidelities to fuse information, and places a deep matrix GP prior over the basis weights.
arXiv Detail & Related papers (2020-06-08T22:31:59Z)
Dual Stochastic Natural Gradient Descent and convergence of interior half-space gradient approximations [0.0]
Multinomial logistic regression (MLR) is widely used in statistics and machine learning. gradient descent (SGD) is the most common approach for determining the parameters of a MLR model in big data scenarios.
arXiv Detail & Related papers (2020-01-19T00:53:49Z)
Q-GADMM: Quantized Group ADMM for Communication Efficient Decentralized Machine Learning [66.18202188565922]
We propose a communication-efficient decentralized machine learning (ML) algorithm, coined QGADMM (QGADMM) We develop a novel quantization method to adaptively adjust modelization levels and their probabilities, while proving the convergence of QGADMM for convex functions.
arXiv Detail & Related papers (2019-10-23T10:47:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.