Unlocking the Potential of Federated Learning for Deeper Models
- URL: http://arxiv.org/abs/2306.02701v1
- Date: Mon, 5 Jun 2023 08:45:44 GMT
- Title: Unlocking the Potential of Federated Learning for Deeper Models
- Authors: Haolin Wang, Xuefeng Liu, Jianwei Niu, Shaojie Tang, Jiaxing Shen
- Abstract summary: Federated learning (FL) is a new paradigm for distributed machine learning that allows a global model to be trained across multiple clients.
We propose several technical guidelines based on reducing divergence, such as using wider models and reducing the receptive field.
These approaches can greatly improve the accuracy of FL on deeper models.
- Score: 24.875271131226707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated learning (FL) is a new paradigm for distributed machine learning
that allows a global model to be trained across multiple clients without
compromising their privacy. Although FL has demonstrated remarkable success in
various scenarios, recent studies mainly utilize shallow and small neural
networks. In our research, we discover a significant performance decline when
applying the existing FL framework to deeper neural networks, even when client
data are independently and identically distributed (i.i.d.). Our further
investigation shows that the decline is due to the continuous accumulation of
dissimilarities among client models during the layer-by-layer back-propagation
process, which we refer to as "divergence accumulation." As deeper models
involve a longer chain of divergence accumulation, they tend to manifest
greater divergence, subsequently leading to performance decline. Both
theoretical derivations and empirical evidence are proposed to support the
existence of divergence accumulation and its amplified effects in deeper
models. To address this issue, we propose several technical guidelines based on
reducing divergence, such as using wider models and reducing the receptive
field. These approaches can greatly improve the accuracy of FL on deeper
models. For example, the application of these guidelines can boost the
ResNet101 model's performance by as much as 43\% on the Tiny-ImageNet dataset.
Related papers
- Strong Model Collapse [16.071600606637908]
We consider a supervised regression setting and establish the existance of a strong form of the model collapse phenomenon.
Our results show that even the smallest fraction of synthetic data can lead to model collapse.
We investigate whether increasing model size, an approach aligned with current trends in training large language models, exacerbates or mitigates model collapse.
arXiv Detail & Related papers (2024-10-07T08:54:23Z) - Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance.
Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning.
Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z) - Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories.
We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z) - Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers.
We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z) - Sparse Flows: Pruning Continuous-depth Models [107.98191032466544]
We show that pruning improves generalization for neural ODEs in generative modeling.
We also show that pruning finds minimal and efficient neural ODE representations with up to 98% less parameters compared to the original network, without loss of accuracy.
arXiv Detail & Related papers (2021-06-24T01:40:17Z) - FjORD: Fair and Accurate Federated Learning under heterogeneous targets
with Ordered Dropout [16.250862114257277]
We introduce Ordered Dropout, a mechanism that achieves an ordered, nested representation of knowledge in Neural Networks.
We employ this technique, along with a self-distillation methodology, in the realm of Federated Learning in a framework called FjORD.
FjORD consistently leads to significant performance gains over state-of-the-art baselines, while maintaining its nested structure.
arXiv Detail & Related papers (2021-02-26T13:07:43Z) - The Self-Simplifying Machine: Exploiting the Structure of Piecewise
Linear Neural Networks to Create Interpretable Models [0.0]
We introduce novel methodology toward simplification and increased interpretability of Piecewise Linear Neural Networks for classification tasks.
Our methods include the use of a trained, deep network to produce a well-performing, single-hidden-layer network without further training.
On these methods, we conduct preliminary studies of model performance, as well as a case study on Wells Fargo's Home Lending dataset.
arXiv Detail & Related papers (2020-12-02T16:02:14Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.