Related papers: Decentralized and Model-Free Federated Learning: Consensus-Based Distillation in Function Space

Decentralized and Model-Free Federated Learning: Consensus-Based Distillation in Function Space

URL: http://arxiv.org/abs/2104.00352v2
Date: Fri, 2 Apr 2021 09:32:12 GMT
Title: Decentralized and Model-Free Federated Learning: Consensus-Based Distillation in Function Space
Authors: Akihito Taya, Takayuki Nishio, Masahiro Morikura, Koji Yamamoto
Abstract summary: This paper proposes a decentralized FL scheme for IoE devices connected via multi-hop networks. It shows that CMFD achieves higher stability than parameter aggregation methods.
Score: 7.627597166844701
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper proposes a decentralized FL scheme for IoE devices connected via multi-hop networks. FL has gained attention as an enabler of privacy-preserving algorithms, but it is not guaranteed that FL algorithms converge to the optimal point because of non-convexity when using decentralized parameter averaging schemes. Therefore, a distributed algorithm that converges to the optimal solution should be developed. The key idea of the proposed algorithm is to aggregate the local prediction functions, not in a parameter space but in a function space. Since machine learning tasks can be regarded as convex functional optimization problems, a consensus-based optimization algorithm achieves the global optimum if it is tailored to work in a function space. This paper at first analyzes the convergence of the proposed algorithm in a function space, which is referred to as a meta-algorithm. It is shown that spectral graph theory can be applied to the function space in a similar manner as that of numerical vectors. Then, a CMFD is developed for NN as an implementation of the meta-algorithm. CMFD leverages knowledge distillation to realize function aggregation among adjacent devices without parameter averaging. One of the advantages of CMFD is that it works even when NN models are different among the distributed learners. This paper shows that CMFD achieves higher accuracy than parameter aggregation under weakly-connected networks. The stability of CMFD is also higher than that of parameter aggregation methods.

Related papers

A Data-Driven Real-Time Optimal Power Flow Algorithm Using Local Feedback [6.455816281436382]
We propose a data-driven real-time algorithm that uses only the local measurements to solve time-varying AC optimal power flow problems. Specifically, we design a learnable function that takes the local feedback as input in the algorithm. We develop a primal-dual update to solve the variant of the OPF problems based on a deep neural network (DNN)
arXiv Detail & Related papers (2025-02-21T09:02:22Z)
Convergence Visualizer of Decentralized Federated Distillation with Reduced Communication Costs [3.2098126952615442]
Federated learning (FL) achieves collaborative learning without the need for data sharing, thus preventing privacy leakage. This study solves two unresolved challenges of CMFD: (1) communication cost reduction and (2) visualization of model convergence.
arXiv Detail & Related papers (2023-12-19T07:23:49Z)
Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML. This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z)
Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data. In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z)
Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems. It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z)
Communication-Efficient Stochastic Zeroth-Order Optimization for Federated Learning [28.65635956111857]
Federated learning (FL) enables edge devices to collaboratively train a global model without sharing their private data. To enhance the training efficiency of FL, various algorithms have been proposed, ranging from first-order computation to first-order methods.
arXiv Detail & Related papers (2022-01-24T08:56:06Z)
Optimization-Based GenQSGD for Federated Edge Learning [12.371264770814097]
We present a generalized parallel mini-batch convergence descent (SGD) algorithm for federated learning (FL) We optimize the algorithm parameters to minimize the energy cost under the time convergence error. Results demonstrate the significant gains over existing FL algorithms.
arXiv Detail & Related papers (2021-10-25T14:25:11Z)
Distributed Learning and Democratic Embeddings: Polynomial-Time Source Coding Schemes Can Achieve Minimax Lower Bounds for Distributed Gradient Descent under Communication Constraints [46.17631511884969]
We consider the problem of compressing a vector in the n-dimensional Euclidean space, subject to a bit-budget of R-bits per dimension. We show that Democratic and Near-Democratic source-coding schemes are (near) optimal in the sense that the covering efficiency of the resulting quantizer is either dimension independent, or has a very weak logarithmic dependence. We propose a distributed optimization algorithm: DGD-DEF, which employs our proposed coding strategy, and achieves the minimax optimal convergence rate to within (near) constant factors.
arXiv Detail & Related papers (2021-03-13T00:04:11Z)
Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs) Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals. We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z)
Sequential Subspace Search for Functional Bayesian Optimization Incorporating Experimenter Intuition [63.011641517977644]
Our algorithm generates a sequence of finite-dimensional random subspaces of functional space spanned by a set of draws from the experimenter's Gaussian Process. Standard Bayesian optimisation is applied on each subspace, and the best solution found used as a starting point (origin) for the next subspace. We test our algorithm in simulated and real-world experiments, namely blind function matching, finding the optimal precipitation-strengthening function for an aluminium alloy, and learning rate schedule optimisation for deep networks.
arXiv Detail & Related papers (2020-09-08T06:54:11Z)
FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data. To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.