Compare Where It Matters: Using Layer-Wise Regularization To Improve
Federated Learning on Heterogeneous Data
- URL: http://arxiv.org/abs/2112.00407v1
- Date: Wed, 1 Dec 2021 10:46:13 GMT
- Title: Compare Where It Matters: Using Layer-Wise Regularization To Improve
Federated Learning on Heterogeneous Data
- Authors: Ha Min Son, Moon Hyun Kim, Tai-Myoung Chung
- Abstract summary: Federated Learning is a widely adopted method to train neural networks over distributed data.
One main limitation is the performance degradation that occurs when data is heterogeneously distributed.
We present FedCKA: a framework that out-performs previous state-of-the-art methods on various deep learning tasks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Federated Learning is a widely adopted method to train neural networks over
distributed data. One main limitation is the performance degradation that
occurs when data is heterogeneously distributed. While many works have
attempted to address this problem, these methods under-perform because they are
founded on a limited understanding of neural networks. In this work, we verify
that only certain important layers in a neural network require regularization
for effective training. We additionally verify that Centered Kernel Alignment
(CKA) most accurately calculates similarity between layers of neural networks
trained on different data. By applying CKA-based regularization to important
layers during training, we significantly improve performance in heterogeneous
settings. We present FedCKA: a simple framework that out-performs previous
state-of-the-art methods on various deep learning tasks while also improving
efficiency and scalability.
Related papers
- FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup
for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method.
We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate.
We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z) - A Bootstrap Algorithm for Fast Supervised Learning [0.0]
Training a neural network (NN) typically relies on some type of curve-following method, such as gradient descent (and gradient descent (SGD)), ADADELTA, ADAM or limited memory algorithms.
Convergence for these algorithms usually relies on having access to a large quantity of observations in order to achieve a high level of accuracy and, with certain classes of functions, these algorithms could take multiple epochs of data points to catch on.
Herein, a different technique with the potential of achieving dramatically better speeds of convergence is explored: it does not curve-follow but rather relies on 'decoupling' hidden layers and on
arXiv Detail & Related papers (2023-05-04T18:28:18Z) - Kernel function impact on convolutional neural networks [10.98068123467568]
We study the usage of kernel functions at the different layers in a convolutional neural network.
We show how one can effectively leverage kernel functions, by introducing a more distortion aware pooling layers.
We propose Kernelized Dense Layers (KDL), which replace fully-connected layers.
arXiv Detail & Related papers (2023-02-20T19:57:01Z) - On the effectiveness of partial variance reduction in federated learning
with heterogeneous data [27.527995694042506]
We show that the diversity of the final classification layers across clients impedes the performance of the FedAvg algorithm.
Motivated by this, we propose to correct model by variance reduction only on the final layers.
We demonstrate that this significantly outperforms existing benchmarks at a similar or lower communication cost.
arXiv Detail & Related papers (2022-12-05T11:56:35Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent
Kernels [141.29156234353133]
State-of-the-art convex learning methods can perform far worse than their centralized counterparts when clients have dissimilar data distributions.
We show this disparity can largely be attributed to challenges presented by non-NISTity.
We propose a Train-Convexify neural network (TCT) procedure to sidestep this issue.
arXiv Detail & Related papers (2022-07-13T16:58:22Z) - Network Gradient Descent Algorithm for Decentralized Federated Learning [0.2867517731896504]
We study a fully decentralized federated learning algorithm, which is a novel descent gradient algorithm executed on a communication-based network.
In the NGD method, only statistics (e.g., parameter estimates) need to be communicated, minimizing the risk of privacy.
We find that both the learning rate and the network structure play significant roles in determining the NGD estimator's statistical efficiency.
arXiv Detail & Related papers (2022-05-06T02:53:31Z) - Quasi-Global Momentum: Accelerating Decentralized Deep Learning on
Heterogeneous Data [77.88594632644347]
Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks.
In realistic learning scenarios, the presence of heterogeneity across different clients' local datasets poses an optimization challenge.
We propose a novel momentum-based method to mitigate this decentralized training difficulty.
arXiv Detail & Related papers (2021-02-09T11:27:14Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.