High-Dimensional Sparse Data Low-rank Representation via Accelerated Asynchronous Parallel Stochastic Gradient Descent
- URL: http://arxiv.org/abs/2408.16592v1
- Date: Thu, 29 Aug 2024 14:55:33 GMT
- Title: High-Dimensional Sparse Data Low-rank Representation via Accelerated Asynchronous Parallel Stochastic Gradient Descent
- Authors: Qicong Hu, Hao Wu,
- Abstract summary: Low-rank representation can map high-dimensional sparse (HDS) data to low-dimensional feature spaces.
Existing optimization algorithms for LR models are computationally inefficient and slowly convergent on large-scale datasets.
A2PSGD outperforms existing optimization algorithms for HDS data LR in both accuracy and training time.
- Score: 2.2083091880368855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data characterized by high dimensionality and sparsity are commonly used to describe real-world node interactions. Low-rank representation (LR) can map high-dimensional sparse (HDS) data to low-dimensional feature spaces and infer node interactions via modeling data latent associations. Unfortunately, existing optimization algorithms for LR models are computationally inefficient and slowly convergent on large-scale datasets. To address this issue, this paper proposes an Accelerated Asynchronous Parallel Stochastic Gradient Descent A2PSGD for High-Dimensional Sparse Data Low-rank Representation with three fold-ideas: a) establishing a lock-free scheduler to simultaneously respond to scheduling requests from multiple threads; b) introducing a greedy algorithm-based load balancing strategy for balancing the computational load among threads; c) incorporating Nesterov's accelerated gradient into the learning scheme to accelerate model convergence. Empirical studies show that A2PSGD outperforms existing optimization algorithms for HDS data LR in both accuracy and training time.
Related papers
- Parallel Diffusion Solver via Residual Dirichlet Policy Optimization [88.7827307535107]
Diffusion models (DMs) have achieved state-of-the-art generative performance but suffer from high sampling latency due to their sequential denoising nature.<n>Existing solver-based acceleration methods often face significant image quality degradation under a low-dimensional budget.<n>We propose the Ensemble Parallel Direction solver (dubbed as EPD-EPr), a novel ODE solver that mitigates these errors by incorporating multiple gradient parallel evaluations in each step.
arXiv Detail & Related papers (2025-12-28T05:48:55Z) - Time Extrapolation with Graph Convolutional Autoencoder and Tensor Train Decomposition [9.446359051690292]
We develop a time-consistent reduced-order model for parameterized partial differential equations on unstructured grids.<n>In particular, high-fidelity snapshots are represented as a combination of parametric, spatial, and temporal cores via TT decomposition.<n>We enhance the generalization performance by developing a multi-fidelity two-stages approach in the framework of Deep Operator Networks (DeepONet)<n> Numerical results, including heat-conduction, advection-diffusion and vortex-shedding phenomena, demonstrate great performance in effectively learning the dynamic in the extrapolation regime for complex geometries.
arXiv Detail & Related papers (2025-11-28T09:59:17Z) - Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity [51.56484100374058]
We introduce Ringleader ASGD, the first asynchronous algorithm that attains the theoretical lower bounds for parallel computation.<n>Our analysis further establishes that Ringleader ASGD remains optimal under arbitrary gradient and even time-varying speeds.
arXiv Detail & Related papers (2025-09-26T19:19:15Z) - QuantVSR: Low-Bit Post-Training Quantization for Real-World Video Super-Resolution [53.13952833016505]
We propose a low-bit quantization model for real-world video super-resolution (VSR)<n>We use a calibration dataset to measure both spatial and temporal complexity for each layer.<n>We refine the FP and low-bit branches to achieve simultaneous optimization.
arXiv Detail & Related papers (2025-08-06T14:35:59Z) - STRGCN: Capturing Asynchronous Spatio-Temporal Dependencies for Irregular Multivariate Time Series Forecasting [14.156419219696252]
STRGCN captures the complex interdependencies in IMTS by representing them as a fully connected graph.<n>Experiments on four public datasets demonstrate that STRGCN achieves state-of-the-art accuracy, competitive memory usage and training speed.
arXiv Detail & Related papers (2025-05-07T06:41:33Z) - Nesterov Method for Asynchronous Pipeline Parallel Optimization [59.79227116582264]
We introduce a variant of Nesterov Accelerated Gradient (NAG) for asynchronous optimization in Pipeline Parallelism.<n>Specifically, we modify the look-ahead step in NAG to effectively address the staleness in gradients.<n>We theoretically prove that our approach converges at a sublinear rate in the presence of fixed delay in gradients.
arXiv Detail & Related papers (2025-05-02T08:23:29Z) - TeZO: Empowering the Low-Rankness on the Temporal Dimension in the Zeroth-Order Optimization for Fine-tuning LLMs [58.19080159470868]
We propose a novel low-rank ZO estimator, TeZO, which captures the low-rankness across both the model and temporal dimension.
Specifically, we represent ZO perturbations along the temporal dimension as a 3D tensor and employ Canonical Polyadic Decomposition (CPD) to extract each low-rank 2D matrix.
arXiv Detail & Related papers (2025-01-31T11:34:03Z) - Asynchronous Stochastic Gradient Descent with Decoupled Backpropagation and Layer-Wise Updates [1.9241821314180372]
One major shortcoming of backpropagation is the interlocking between the forward and backward phases of the algorithm.
We propose a method that parallelises SGD updates across the layers of a model by asynchronously updating them from multiple threads.
We show that this approach yields close to state-of-theart results while running up to 2.97x faster than Hogwild! scaled on multiple devices.
arXiv Detail & Related papers (2024-10-08T12:32:36Z) - An Adaptive Latent Factorization of Tensors Model for Embedding Dynamic Communication Network [15.577058568902272]
The Dynamic Communication Network (DCN) describes the interactions over time among various communication nodes.
This paper proposes an Adaptive Temporal-dependent low-rank representation model (ATT)
The experimental results on four real-world DCNs demonstrate that the proposed ATT model significantly outperforms several state-of-the-art models in both prediction errors and convergence rounds.
arXiv Detail & Related papers (2024-08-29T14:40:32Z) - Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Framework [56.82432591933544]
Distributed gradient descent (SGD) has attracted considerable recent attention due to its potential for scaling computational resources, reducing training time, and helping protect user privacy in machine learning.
This paper presents the run time and staleness of distributed SGD based on delay differential equations (SDDEs) and the approximation of gradient arrivals.
It is interestingly shown that increasing the number of activated workers does not necessarily accelerate distributed SGD due to staleness.
arXiv Detail & Related papers (2024-06-17T02:56:55Z) - Robust Fully-Asynchronous Methods for Distributed Training over General Architecture [11.480605289411807]
Perfect synchronization in distributed machine learning problems is inefficient and even impossible due to the existence of latency, package losses and stragglers.
We propose Fully-Asynchronous Gradient Tracking method (R-FAST), where each device performs local computation and communication at its own without any form of impact.
arXiv Detail & Related papers (2023-07-21T14:36:40Z) - Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based
Motion Recognition [62.46544616232238]
Previous motion recognition methods have achieved promising performance through the tightly coupled multi-temporal representation.
We propose to decouple and recouple caused caused representation for RGB-D-based motion recognition.
arXiv Detail & Related papers (2021-12-16T18:59:47Z) - Distributed stochastic optimization with large delays [59.95552973784946]
One of the most widely used methods for solving large-scale optimization problems is distributed asynchronous gradient descent (DASGD)
We show that DASGD converges to a global optimal implementation model under same delay assumptions.
arXiv Detail & Related papers (2021-07-06T21:59:49Z) - Spatial-Spectral Clustering with Anchor Graph for Hyperspectral Image [88.60285937702304]
This paper proposes a novel unsupervised approach called spatial-spectral clustering with anchor graph (SSCAG) for HSI data clustering.
The proposed SSCAG is competitive against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-04-24T08:09:27Z) - Deep Cellular Recurrent Network for Efficient Analysis of Time-Series
Data with Spatial Information [52.635997570873194]
This work proposes a novel deep cellular recurrent neural network (DCRNN) architecture to process complex multi-dimensional time series data with spatial information.
The proposed architecture achieves state-of-the-art performance while utilizing substantially less trainable parameters when compared to comparable methods in the literature.
arXiv Detail & Related papers (2021-01-12T20:08:18Z) - DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic
Convolution [136.7261709896713]
We propose a data-driven approach that generates the appropriate convolution kernels to apply in response to the nature of the instances.
The proposed method achieves promising results on both ScanetNetV2 and S3DIS.
It also improves inference speed by more than 25% over the current state-of-the-art.
arXiv Detail & Related papers (2020-11-26T14:56:57Z) - HPSGD: Hierarchical Parallel SGD With Stale Gradients Featuring [18.8426865970643]
A novel Hierarchical Parallel SGD (HPSGD) strategy is proposed to boost the distributed training process of the deep neural network (DNN)
Experiments are conducted to demonstrate that the proposed HPSGD approach substantially boosts the distributed DNN training, reduces the disturbance of the stale gradients and achieves better accuracy in given fixed wall-time.
arXiv Detail & Related papers (2020-09-06T10:17:56Z) - DaSGD: Squeezing SGD Parallelization Performance in Distributed Training
Using Delayed Averaging [4.652668321425679]
Minibatch gradient descent (SGD) algorithm requires workers to halt forward/back propagations.
DaSGD parallelizes SGD and forward/back propagations to hide 100% of the communication overhead.
arXiv Detail & Related papers (2020-05-31T05:43:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.