Quantized Hierarchical Federated Learning: A Robust Approach to
Statistical Heterogeneity
- URL: http://arxiv.org/abs/2403.01540v1
- Date: Sun, 3 Mar 2024 15:40:24 GMT
- Title: Quantized Hierarchical Federated Learning: A Robust Approach to
Statistical Heterogeneity
- Authors: Seyed Mohammad Azimi-Abarghouyi, Viktoria Fodor
- Abstract summary: We present a novel hierarchical federated learning algorithm that incorporates quantization for communication-efficiency.
We offer a comprehensive analytical framework to evaluate its optimality gap and convergence rate.
Our findings reveal that our algorithm consistently achieves high learning accuracy over a range of parameters.
- Score: 3.8798345704175534
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This paper presents a novel hierarchical federated learning algorithm within
multiple sets that incorporates quantization for communication-efficiency and
demonstrates resilience to statistical heterogeneity. Unlike conventional
hierarchical federated learning algorithms, our approach combines gradient
aggregation in intra-set iterations with model aggregation in inter-set
iterations. We offer a comprehensive analytical framework to evaluate its
optimality gap and convergence rate, comparing these aspects with those of
conventional algorithms. Additionally, we develop a problem formulation to
derive optimal system parameters in a closed-form solution. Our findings reveal
that our algorithm consistently achieves high learning accuracy over a range of
parameters and significantly outperforms other hierarchical algorithms,
particularly in scenarios with heterogeneous data distributions.
Related papers
- DYNAMITE: Dynamic Interplay of Mini-Batch Size and Aggregation Frequency
for Federated Learning with Static and Streaming Dataset [23.11152686493894]
Federated Learning (FL) is a distributed learning paradigm that can coordinate heterogeneous edge devices to perform model training without sharing private data.
This paper introduces novel analytical models and optimization algorithms that leverage the interplay between batch size and aggregation frequency to navigate the trade-offs among convergence, cost, and completion time for dynamic FL training.
arXiv Detail & Related papers (2023-10-20T08:36:12Z) - Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML.
This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z) - Distributionally Robust Model-based Reinforcement Learning with Large
State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment.
We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets.
We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z) - Analysis and Optimization of Wireless Federated Learning with Data
Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation.
We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE)
Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z) - Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data.
Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z) - Communication-Efficient Gradient Descent-Accent Methods for Distributed Variational Inequalities: Unified Analysis and Local Updates [28.700663352789395]
We provide a unified convergence analysis of communication-efficient local training methods for distributed variational inequality problems (VIPs)
Our approach is based on a general key assumption on the estimates that allows us to propose and analyze several novel local training algorithms.
We present the first local descent-accent algorithms with provable improved communication complexity for solving distributed variational inequalities on heterogeneous data.
arXiv Detail & Related papers (2023-06-08T10:58:46Z) - Decentralized Multi-Level Compositional Optimization Algorithms with Level-Independent Convergence Rate [26.676582181833584]
Decentralized multi-level optimization is challenging because of the multilevel structure and decentralized communication.
We develop two novel decentralized optimization algorithms to optimize the multi-level compositional problem.
arXiv Detail & Related papers (2023-06-06T00:23:28Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - On the Effects of Data Heterogeneity on the Convergence Rates of Distributed Linear System Solvers [9.248526557884498]
We consider the problem of solving a large-scale system of linear equations in a distributed or federated manner by a taskmaster and a set of machines.
We compare two well-known classes of algorithms used to solve this problem: projection-based methods and optimization-based methods.
arXiv Detail & Related papers (2023-04-20T20:48:00Z) - Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data.
In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z) - Scalable Hierarchical Over-the-Air Federated Learning [3.8798345704175534]
This work introduces a new two-level learning method designed to handle both interference and device data heterogeneity.
We present a comprehensive mathematical approach to derive the convergence bound for the proposed algorithm.
Despite the interference and data heterogeneity, the proposed algorithm achieves high learning accuracy for a variety of parameters.
arXiv Detail & Related papers (2022-11-29T12:46:37Z) - On the Convergence of Distributed Stochastic Bilevel Optimization
Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models.
Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data.
We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z) - Contextual Model Aggregation for Fast and Robust Federated Learning in
Edge Computing [88.76112371510999]
Federated learning is a prime candidate for distributed machine learning at the network edge.
Existing algorithms face issues with slow convergence and/or robustness of performance.
We propose a contextual aggregation scheme that achieves the optimal context-dependent bound on loss reduction.
arXiv Detail & Related papers (2022-03-23T21:42:31Z) - Adaptive Resonance Theory-based Topological Clustering with a Divisive
Hierarchical Structure Capable of Continual Learning [8.581682204722894]
This paper proposes an ART-based topological clustering algorithm with a mechanism that automatically estimates a similarity threshold from a distribution of data points.
For the improving information extraction performance, a divisive hierarchical clustering algorithm capable of continual learning is proposed.
arXiv Detail & Related papers (2022-01-26T02:34:52Z) - Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings.
We exploit a warm-start strategy to amortize the estimation of the exact gradient.
By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z) - Resource-constrained Federated Edge Learning with Heterogeneous Data:
Formulation and Analysis [8.863089484787835]
We propose a distributed approximate Newton-type Newton-type training scheme, namely FedOVA, to solve the heterogeneous statistical challenge brought by heterogeneous data.
FedOVA decomposes a multi-class classification problem into more straightforward binary classification problems and then combines their respective outputs using ensemble learning.
arXiv Detail & Related papers (2021-10-14T17:35:24Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Sample-based and Feature-based Federated Learning via Mini-batch SSCA [18.11773963976481]
This paper investigates sample-based and feature-based federated optimization.
We show that the proposed algorithms can preserve data privacy through the model aggregation mechanism.
We also show that the proposed algorithms converge to Karush-Kuhn-Tucker points of the respective federated optimization problems.
arXiv Detail & Related papers (2021-04-13T08:23:46Z) - Straggler-Resilient Federated Learning: Leveraging the Interplay Between
Statistical Accuracy and System Heterogeneity [57.275753974812666]
Federated learning involves learning from data samples distributed across a network of clients while the data remains local.
In this paper, we propose a novel straggler-resilient federated learning method that incorporates statistical characteristics of the clients' data to adaptively select the clients in order to speed up the learning procedure.
arXiv Detail & Related papers (2020-12-28T19:21:14Z) - Federated Learning with Compression: Unified Analysis and Sharp
Guarantees [39.092596142018195]
Communication cost is often a critical bottleneck to scale up distributed optimization algorithms to collaboratively learn a model from millions of devices.
Two notable trends to deal with the communication overhead of federated compression and computation are unreliable compression and heterogeneous communication.
We analyze their convergence in both homogeneous and heterogeneous data distribution settings.
arXiv Detail & Related papers (2020-07-02T14:44:07Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.