Unlocking the Potential of Federated Learning for Deeper Models
- URL: http://arxiv.org/abs/2306.02701v1
- Date: Mon, 5 Jun 2023 08:45:44 GMT
- Title: Unlocking the Potential of Federated Learning for Deeper Models
- Authors: Haolin Wang, Xuefeng Liu, Jianwei Niu, Shaojie Tang, Jiaxing Shen
- Abstract summary: Federated learning (FL) is a new paradigm for distributed machine learning that allows a global model to be trained across multiple clients.
We propose several technical guidelines based on reducing divergence, such as using wider models and reducing the receptive field.
These approaches can greatly improve the accuracy of FL on deeper models.
- Score: 24.875271131226707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated learning (FL) is a new paradigm for distributed machine learning
that allows a global model to be trained across multiple clients without
compromising their privacy. Although FL has demonstrated remarkable success in
various scenarios, recent studies mainly utilize shallow and small neural
networks. In our research, we discover a significant performance decline when
applying the existing FL framework to deeper neural networks, even when client
data are independently and identically distributed (i.i.d.). Our further
investigation shows that the decline is due to the continuous accumulation of
dissimilarities among client models during the layer-by-layer back-propagation
process, which we refer to as "divergence accumulation." As deeper models
involve a longer chain of divergence accumulation, they tend to manifest
greater divergence, subsequently leading to performance decline. Both
theoretical derivations and empirical evidence are proposed to support the
existence of divergence accumulation and its amplified effects in deeper
models. To address this issue, we propose several technical guidelines based on
reducing divergence, such as using wider models and reducing the receptive
field. These approaches can greatly improve the accuracy of FL on deeper
models. For example, the application of these guidelines can boost the
ResNet101 model's performance by as much as 43\% on the Tiny-ImageNet dataset.
Related papers
- Visual Prompting Upgrades Neural Network Sparsification: A Data-Model
Perspective [67.25782152459851]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - NeFL: Nested Federated Learning for Heterogeneous Clients [48.160716521203256]
Federated learning (FL) is a promising approach in distributed learning keeping privacy.
During the training pipeline of FL, slow or incapable clients (i.e., stragglers) slow down the total training time and degrade performance.
We propose nested federated learning (NeFL), a framework that efficiently divides a model into submodels using both depthwise and widthwise scaling.
arXiv Detail & Related papers (2023-08-15T13:29:14Z) - FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories.
We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z) - Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers.
We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z) - Sparse Flows: Pruning Continuous-depth Models [107.98191032466544]
We show that pruning improves generalization for neural ODEs in generative modeling.
We also show that pruning finds minimal and efficient neural ODE representations with up to 98% less parameters compared to the original network, without loss of accuracy.
arXiv Detail & Related papers (2021-06-24T01:40:17Z) - FjORD: Fair and Accurate Federated Learning under heterogeneous targets
with Ordered Dropout [16.250862114257277]
We introduce Ordered Dropout, a mechanism that achieves an ordered, nested representation of knowledge in Neural Networks.
We employ this technique, along with a self-distillation methodology, in the realm of Federated Learning in a framework called FjORD.
FjORD consistently leads to significant performance gains over state-of-the-art baselines, while maintaining its nested structure.
arXiv Detail & Related papers (2021-02-26T13:07:43Z) - The Self-Simplifying Machine: Exploiting the Structure of Piecewise
Linear Neural Networks to Create Interpretable Models [0.0]
We introduce novel methodology toward simplification and increased interpretability of Piecewise Linear Neural Networks for classification tasks.
Our methods include the use of a trained, deep network to produce a well-performing, single-hidden-layer network without further training.
On these methods, we conduct preliminary studies of model performance, as well as a case study on Wells Fargo's Home Lending dataset.
arXiv Detail & Related papers (2020-12-02T16:02:14Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [55.28436972267793]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - On Robustness and Transferability of Convolutional Neural Networks [147.71743081671508]
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts.
We study the interplay between out-of-distribution and transfer performance of modern image classification CNNs for the first time.
We find that increasing both the training set and model sizes significantly improve the distributional shift robustness.
arXiv Detail & Related papers (2020-07-16T18:39:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.