Towards Model Agnostic Federated Learning Using Knowledge Distillation
- URL: http://arxiv.org/abs/2110.15210v1
- Date: Thu, 28 Oct 2021 15:27:51 GMT
- Title: Towards Model Agnostic Federated Learning Using Knowledge Distillation
- Authors: Andrei Afonin, Sai Praneeth Karimireddy
- Abstract summary: In this work, we initiate a theoretical study of model agnostic communication protocols.
We focus on the setting where the two agents are attempting to perform kernel regression using different kernels.
Our study yields a surprising result -- the most natural algorithm of using alternating knowledge distillation (AKD) imposes overly strong regularization.
- Score: 9.947968358822951
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An often unquestioned assumption underlying most current federated learning
algorithms is that all the participants use identical model architectures. In
this work, we initiate a theoretical study of model agnostic communication
protocols which would allow data holders (agents) using different models to
collaborate with each other and perform federated learning. We focus on the
setting where the two agents are attempting to perform kernel regression using
different kernels (and hence have different models). Our study yields a
surprising result -- the most natural algorithm of using alternating knowledge
distillation (AKD) imposes overly strong regularization and may lead to severe
under-fitting. Our theory also shows an interesting connection between AKD and
the alternating projection algorithm for finding intersection of sets.
Leveraging this connection, we propose a new algorithms which improve upon AKD.
Our theoretical predictions also closely match real world experiments using
neural networks. Thus, our work proposes a rich yet tractable framework for
analyzing and developing new practical model agnostic federated learning
algorithms.
Related papers
- A Kernel Perspective on Distillation-based Collaborative Learning [8.971234046933349]
We propose a nonparametric collaborative learning algorithm that does not directly share local data or models in statistically heterogeneous environments.
Inspired by our theoretical results, we also propose a practical distillation-based collaborative learning algorithm based on neural network architecture.
arXiv Detail & Related papers (2024-10-23T06:40:13Z) - Discrete Neural Algorithmic Reasoning [18.497863598167257]
We propose to force neural reasoners to maintain the execution trajectory as a combination of finite predefined states.
trained with supervision on the algorithm's state transitions, such models are able to perfectly align with the original algorithm.
arXiv Detail & Related papers (2024-02-18T16:03:04Z) - Algorithmic Collective Action in Machine Learning [35.91866986642348]
We study algorithmic collective action on digital platforms that deploy machine learning algorithms.
We propose a simple theoretical model of a collective interacting with a firm's learning algorithm.
We conduct systematic experiments on a skill classification task involving tens of thousands of resumes from a gig platform for freelancers.
arXiv Detail & Related papers (2023-02-08T18:55:49Z) - Proof of Swarm Based Ensemble Learning for Federated Learning
Applications [3.2536767864585663]
In federated learning it is not feasible to apply centralised ensemble learning directly due to privacy concerns.
Most distributed consensus algorithms, such as Byzantine fault tolerance (BFT), do not normally perform well in such applications.
We propose PoSw, a novel distributed consensus algorithm for ensemble learning in a federated setting.
arXiv Detail & Related papers (2022-12-28T13:53:34Z) - Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data.
In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z) - On the Convergence of Distributed Stochastic Bilevel Optimization
Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models.
Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data.
We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z) - Federated Learning Aggregation: New Robust Algorithms with Guarantees [63.96013144017572]
Federated learning has been recently proposed for distributed model training at the edge.
This paper presents a complete general mathematical convergence analysis to evaluate aggregation strategies in a federated learning framework.
We derive novel aggregation algorithms which are able to modify their model architecture by differentiating client contributions according to the value of their losses.
arXiv Detail & Related papers (2022-05-22T16:37:53Z) - Network Gradient Descent Algorithm for Decentralized Federated Learning [0.2867517731896504]
We study a fully decentralized federated learning algorithm, which is a novel descent gradient algorithm executed on a communication-based network.
In the NGD method, only statistics (e.g., parameter estimates) need to be communicated, minimizing the risk of privacy.
We find that both the learning rate and the network structure play significant roles in determining the NGD estimator's statistical efficiency.
arXiv Detail & Related papers (2022-05-06T02:53:31Z) - Characterizing and overcoming the greedy nature of learning in
multi-modal deep neural networks [62.48782506095565]
We show that due to the greedy nature of learning in deep neural networks, models tend to rely on just one modality while under-fitting the other modalities.
We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning.
arXiv Detail & Related papers (2022-02-10T20:11:21Z) - Towards Understanding Ensemble, Knowledge Distillation and
Self-Distillation in Deep Learning [93.18238573921629]
We study how Ensemble of deep learning models can improve test accuracy, and how the superior performance of ensemble can be distilled into a single model.
We show that ensemble/knowledge distillation in deep learning works very differently from traditional learning theory.
We prove that self-distillation can also be viewed as implicitly combining ensemble and knowledge distillation to improve test accuracy.
arXiv Detail & Related papers (2020-12-17T18:34:45Z) - Efficient Model-Based Reinforcement Learning through Optimistic Policy
Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms.
Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.