Decentralized and Model-Free Federated Learning: Consensus-Based
Distillation in Function Space
- URL: http://arxiv.org/abs/2104.00352v2
- Date: Fri, 2 Apr 2021 09:32:12 GMT
- Title: Decentralized and Model-Free Federated Learning: Consensus-Based
Distillation in Function Space
- Authors: Akihito Taya, Takayuki Nishio, Masahiro Morikura, Koji Yamamoto
- Abstract summary: This paper proposes a decentralized FL scheme for IoE devices connected via multi-hop networks.
It shows that CMFD achieves higher stability than parameter aggregation methods.
- Score: 7.627597166844701
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a decentralized FL scheme for IoE devices connected via
multi-hop networks. FL has gained attention as an enabler of privacy-preserving
algorithms, but it is not guaranteed that FL algorithms converge to the optimal
point because of non-convexity when using decentralized parameter averaging
schemes. Therefore, a distributed algorithm that converges to the optimal
solution should be developed. The key idea of the proposed algorithm is to
aggregate the local prediction functions, not in a parameter space but in a
function space. Since machine learning tasks can be regarded as convex
functional optimization problems, a consensus-based optimization algorithm
achieves the global optimum if it is tailored to work in a function space. This
paper at first analyzes the convergence of the proposed algorithm in a function
space, which is referred to as a meta-algorithm. It is shown that spectral
graph theory can be applied to the function space in a similar manner as that
of numerical vectors. Then, a CMFD is developed for NN as an implementation of
the meta-algorithm. CMFD leverages knowledge distillation to realize function
aggregation among adjacent devices without parameter averaging. One of the
advantages of CMFD is that it works even when NN models are different among the
distributed learners. This paper shows that CMFD achieves higher accuracy than
parameter aggregation under weakly-connected networks. The stability of CMFD is
also higher than that of parameter aggregation methods.
Related papers
- Convergence Visualizer of Decentralized Federated Distillation with
Reduced Communication Costs [3.2098126952615442]
Federated learning (FL) achieves collaborative learning without the need for data sharing, thus preventing privacy leakage.
This study solves two unresolved challenges of CMFD: (1) communication cost reduction and (2) visualization of model convergence.
arXiv Detail & Related papers (2023-12-19T07:23:49Z) - Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML.
This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z) - Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data.
In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - Communication-Efficient Stochastic Zeroth-Order Optimization for
Federated Learning [28.65635956111857]
Federated learning (FL) enables edge devices to collaboratively train a global model without sharing their private data.
To enhance the training efficiency of FL, various algorithms have been proposed, ranging from first-order computation to first-order methods.
arXiv Detail & Related papers (2022-01-24T08:56:06Z) - Optimization-Based GenQSGD for Federated Edge Learning [12.371264770814097]
We present a generalized parallel mini-batch convergence descent (SGD) algorithm for federated learning (FL)
We optimize the algorithm parameters to minimize the energy cost under the time convergence error.
Results demonstrate the significant gains over existing FL algorithms.
arXiv Detail & Related papers (2021-10-25T14:25:11Z) - Distributed Learning and Democratic Embeddings: Polynomial-Time Source
Coding Schemes Can Achieve Minimax Lower Bounds for Distributed Gradient
Descent under Communication Constraints [46.17631511884969]
We consider the problem of compressing a vector in the n-dimensional Euclidean space, subject to a bit-budget of R-bits per dimension.
We show that Democratic and Near-Democratic source-coding schemes are (near) optimal in the sense that the covering efficiency of the resulting quantizer is either dimension independent, or has a very weak logarithmic dependence.
We propose a distributed optimization algorithm: DGD-DEF, which employs our proposed coding strategy, and achieves the minimax optimal convergence rate to within (near) constant factors.
arXiv Detail & Related papers (2021-03-13T00:04:11Z) - Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs)
Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals.
We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z) - Sequential Subspace Search for Functional Bayesian Optimization
Incorporating Experimenter Intuition [63.011641517977644]
Our algorithm generates a sequence of finite-dimensional random subspaces of functional space spanned by a set of draws from the experimenter's Gaussian Process.
Standard Bayesian optimisation is applied on each subspace, and the best solution found used as a starting point (origin) for the next subspace.
We test our algorithm in simulated and real-world experiments, namely blind function matching, finding the optimal precipitation-strengthening function for an aluminium alloy, and learning rate schedule optimisation for deep networks.
arXiv Detail & Related papers (2020-09-08T06:54:11Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.