Related papers: RCD-SGD: Resource-Constrained Distributed SGD in Heterogeneous Environment via Submodular Partitioning

RCD-SGD: Resource-Constrained Distributed SGD in Heterogeneous Environment via Submodular Partitioning

URL: http://arxiv.org/abs/2211.00839v2
Date: Mon, 18 Sep 2023 22:31:10 GMT
Title: RCD-SGD: Resource-Constrained Distributed SGD in Heterogeneous Environment via Submodular Partitioning
Authors: Haoze He and Parijat Dube
Abstract summary: We develop a framework for distributed training algorithms based on a novel data partitioning algorithm involving submodular optimization. Based on this algorithm, we develop a distributed SGD framework that can accelerate existing SOTA distributed training algorithms by up to 32%.
Score: 1.9145351898882879
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The convergence of SGD based distributed training algorithms is tied to the data distribution across workers. Standard partitioning techniques try to achieve equal-sized partitions with per-class population distribution in proportion to the total dataset. Partitions having the same overall population size or even the same number of samples per class may still have Non-IID distribution in the feature space. In heterogeneous computing environments, when devices have different computing capabilities, even-sized partitions across devices can lead to the straggler problem in distributed SGD. We develop a framework for distributed SGD in heterogeneous environments based on a novel data partitioning algorithm involving submodular optimization. Our data partitioning algorithm explicitly accounts for resource heterogeneity across workers while achieving similar class-level feature distribution and maintaining class balance. Based on this algorithm, we develop a distributed SGD framework that can accelerate existing SOTA distributed training algorithms by up to 32%.

Related papers

Hierarchical Learning-based Graph Partition for Large-scale Vehicle Routing Problems [19.54367116789867]
We propose a versatile Hierarchical Learning-based Graph Partition (HLGP) framework to benefit the partition of CVRP instances. HLGP is tailored to benefit the partition of CVRP instances by synergistically integrating global and local partition policies.
arXiv Detail & Related papers (2025-02-12T12:07:09Z)
Stability and Generalization for Distributed SGDA [70.97400503482353]
We propose the stability-based generalization analytical framework for Distributed-SGDA. We conduct a comprehensive analysis of stability error, generalization gap, and population risk across different metrics. Our theoretical results reveal the trade-off between the generalization gap and optimization error.
arXiv Detail & Related papers (2024-11-14T11:16:32Z)
Improving Distribution Alignment with Diversity-based Sampling [0.0]
Domain shifts are ubiquitous in machine learning, and can substantially degrade a model's performance when deployed to real-world data. This paper proposes to improve these estimates by inducing diversity in each sampled minibatch. It simultaneously balances the data and reduces the variance of the gradients, thereby enhancing the model's generalisation ability.
arXiv Detail & Related papers (2024-10-05T17:26:03Z)
Clustering-Based Validation Splits for Model Selection under Domain Shift [0.0]
It is proposed that the training-validation split should maximise the distribution mismatch between the two sets. A constrained clustering algorithm, which leverages linear programming to control the size, label, and (optionally) group distributions of the splits, is presented.
arXiv Detail & Related papers (2024-05-29T19:21:17Z)
Distribution Shift Inversion for Out-of-Distribution Prediction [57.22301285120695]
We propose a portable Distribution Shift Inversion algorithm for Out-of-Distribution (OoD) prediction. We show that our method provides a general performance gain when plugged into a wide range of commonly used OoD algorithms.
arXiv Detail & Related papers (2023-06-14T08:00:49Z)
Latent Distribution Adjusting for Face Anti-Spoofing [29.204168516602568]
We propose a unified framework called Latent Distribution Adjusting (LDA) to improve the robustness of the face anti-spoofing (FAS) model. To enhance the intra-class compactness and inter-class discrepancy, we propose a margin-based loss for providing distribution constrains for prototype learning. Our framework can 1) make the final representation space both intra-class compact and inter-class separable, 2) outperform the state-of-the-art methods on multiple standard FAS benchmarks.
arXiv Detail & Related papers (2023-05-16T08:43:14Z)
Heterogeneous Federated Learning on a Graph [9.135254524746847]
Federated learning, where algorithms are trained across multiple decentralized devices without sharing local data, is increasingly popular in machine learning practice. In this work, we consider parameter estimation in federated learning iteration with data distribution and communication heterogeneity, as well as limited computational capacity of local devices. We highlight, our algorithm transmits only parameters along edges of $G$ at convergence rate $O(T-1log T)$ where $T$ denotes the number of iterations.
arXiv Detail & Related papers (2022-09-19T03:18:10Z)
Decentralized Local Stochastic Extra-Gradient for Variational Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices. We make a very general assumption on the computational network that covers the settings of fully decentralized calculations. We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z)
OoD-Bench: Benchmarking and Understanding Out-of-Distribution Generalization Datasets and Algorithms [28.37021464780398]
We show that existing OoD algorithms that outperform empirical risk minimization on one distribution shift usually have limitations on the other distribution shift. The new benchmark may serve as a strong foothold that can be resorted to by future OoD generalization research.
arXiv Detail & Related papers (2021-06-07T15:34:36Z)
Partition-Guided GANs [63.980473635585234]
We design a partitioner that breaks the space into smaller regions, each having a simpler distribution, and training a different generator for each partition. This is done in an unsupervised manner without requiring any labels. Experimental results on various standard benchmarks show that the proposed unsupervised model outperforms several recent methods.
arXiv Detail & Related papers (2021-04-02T00:06:53Z)
Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs) Semi-implicit actor (SIA) powered by a flexible policy distribution. We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)
Brainstorming Generative Adversarial Networks (BGANs): Towards Multi-Agent Generative Models with Distributed Private Datasets [70.62568022925971]
generative adversarial networks (GANs) must be fed by large datasets that adequately represent the data space. In many scenarios, the available datasets may be limited and distributed across multiple agents, each of which is seeking to learn the distribution of the data on its own. In this paper, a novel brainstorming GAN (BGAN) architecture is proposed using which multiple agents can generate real-like data samples while operating in a fully distributed manner.
arXiv Detail & Related papers (2020-02-02T02:58:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.