Related papers: FIARSE: Model-Heterogeneous Federated Learning via Importance-Aware Submodel Extraction

FIARSE: Model-Heterogeneous Federated Learning via Importance-Aware Submodel Extraction

URL: http://arxiv.org/abs/2407.19389v2
Date: Thu, 31 Oct 2024 18:54:34 GMT
Title: FIARSE: Model-Heterogeneous Federated Learning via Importance-Aware Submodel Extraction
Authors: Feijie Wu, Xingchen Wang, Yaqing Wang, Tianci Liu, Lu Su, Jing Gao,
Abstract summary: Federated Importance-Aware Submodel Extraction (FIARSE) is a novel approach that dynamically adjusts submodels based on the importance of model parameters. Compared to existing works, the proposed method offers a theoretical foundation for the submodel extraction. Extensive experiments are conducted on various datasets to showcase the superior performance of the proposed FIARSE.
Score: 26.26211464623954
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In federated learning (FL), accommodating clients' varied computational capacities poses a challenge, often limiting the participation of those with constrained resources in global model training. To address this issue, the concept of model heterogeneity through submodel extraction has emerged, offering a tailored solution that aligns the model's complexity with each client's computational capacity. In this work, we propose Federated Importance-Aware Submodel Extraction (FIARSE), a novel approach that dynamically adjusts submodels based on the importance of model parameters, thereby overcoming the limitations of previous static and dynamic submodel extraction methods. Compared to existing works, the proposed method offers a theoretical foundation for the submodel extraction and eliminates the need for additional information beyond the model parameters themselves to determine parameter importance, significantly reducing the overhead on clients. Extensive experiments are conducted on various datasets to showcase the superior performance of the proposed FIARSE.

Related papers

LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging [10.33844295243509]
We propose a unified framework for model merging based on low-rank estimation of task vectors without the need for access to the base model, named textscLoRE-Merging. Our approach is motivated by the observation that task vectors from fine-tuned models frequently exhibit a limited number of dominant singular values, making low-rank estimations less prone to interference.
arXiv Detail & Related papers (2025-02-15T10:18:46Z)
Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling. Our research explores task-specific model pruning to inform decisions about designing SMoE architectures. We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z)
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction. SMILE allows for the upscaling of source models into an MoE model without extra data or further training. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z)
EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods. EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z)
Learning a model is paramount for sample efficiency in reinforcement learning control of PDEs [5.488334211013093]
We show that learning an actuated model in parallel to training the RL agent significantly reduces the total amount of required data sampled from the real system. We also show that iteratively updating the model is of major importance to avoid biases in the RL training.
arXiv Detail & Related papers (2023-02-14T16:14:39Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
Distributional Depth-Based Estimation of Object Articulation Models [21.046351215949525]
We propose a method that efficiently learns distributions over articulation model parameters directly from depth images. Our core contributions include a novel representation for distributions over rigid body transformations. We introduce a novel deep learning based approach, DUST-net, that performs category-independent articulation model estimation.
arXiv Detail & Related papers (2021-08-12T17:44:51Z)
Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation [11.219641045667055]
We propose an end-to-end approach for model learning which directly optimize the expected returns using implicit differentiation. We provide theoretical and empirical evidence highlighting the benefits of our approach in the model misspecification regime compared to likelihood-based methods.
arXiv Detail & Related papers (2021-06-06T23:15:49Z)
Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors. We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method. Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)
A Perspective on Machine Learning Methods in Turbulence Modelling [0.0]
This work presents a review of the current state of research in data-driven turbulence closure modeling. We stress that consistency of the training data, the model, the underlying physics and the discretization is a key issue that needs to be considered for a successful ML-augmented modeling strategy.
arXiv Detail & Related papers (2020-10-23T08:19:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.