Related papers: Model Fusion with Kullback--Leibler Divergence

Model Fusion with Kullback--Leibler Divergence

URL: http://arxiv.org/abs/2007.06168v1
Date: Mon, 13 Jul 2020 03:27:45 GMT
Title: Model Fusion with Kullback--Leibler Divergence
Authors: Sebastian Claici, Mikhail Yurochkin, Soumya Ghosh and Justin Solomon
Abstract summary: We propose a method to fuse posterior distributions learned from heterogeneous datasets. Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors.
Score: 58.20269014662046
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a method to fuse posterior distributions learned from heterogeneous datasets. Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors and proceeds using a simple assign-and-average approach. The components of the dataset posteriors are assigned to the proposed global model components by solving a regularized variant of the assignment problem. The global components are then updated based on these assignments by their mean under a KL divergence. For exponential family variational distributions, our formulation leads to an efficient non-parametric algorithm for computing the fused model. Our algorithm is easy to describe and implement, efficient, and competitive with state-of-the-art on motion capture analysis, topic modeling, and federated learning of Bayesian neural networks.

Related papers

Nonparametric Data Attribution for Diffusion Models [57.820618036556084]
Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs.<n>We propose a nonparametric attribution method that operates entirely on data, measuring influence via patch-level similarity between generated and training images.
arXiv Detail & Related papers (2025-10-16T03:37:16Z)
Model Fusion via Neuron Interpolation [25.022849817421964]
We present a novel family of model fusion algorithms designed to integrate multiple trained neural networks into a single network.<n>Our algorithms group intermediate neurons of parent models to create target representations that the fused model approximates with its corresponding sub-network.
arXiv Detail & Related papers (2025-06-18T23:31:05Z)
Communication-Efficient Personalized Distributed Learning with Data and Node Heterogeneity [40.64395367773766]
We propose a distributed strong lottery ticket hypothesis (TH), based on which a communication-efficient personalized learning algorithm is developed. In the proposed method, each local model is represented as the Hadamard product of global real-valued parameters and a personalized binary mask for pruning. We provide a theoretical proof for the DSLTH, establishing it as the foundation of the proposed method.
arXiv Detail & Related papers (2025-04-24T13:02:54Z)
Mixture models for data with unknown distributions [0.6345523830122168]
We describe and analyze a broad class of mixture models for real-valued multivariate data. We return both a division of the data and an estimate of the distributions, effectively performing clustering and density estimation within each cluster at the same time. We demonstrate our methods with a selection of illustrative applications and give code implementing both algorithms.
arXiv Detail & Related papers (2025-02-26T22:42:40Z)
Vanishing Variance Problem in Fully Decentralized Neural-Network Systems [0.8212195887472242]
Federated learning and gossip learning are emerging methodologies designed to mitigate data privacy concerns. Our research introduces a variance-corrected model averaging algorithm. Our simulation results demonstrate that our approach enables gossip learning to achieve convergence efficiency comparable to that of federated learning.
arXiv Detail & Related papers (2024-04-06T12:49:20Z)
Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift [12.770658031721435]
We propose a method for adapting the weights of the last layer of a pre-trained neural regression model to perform better on input data originating from a different distribution. We demonstrate how this lightweight spectral adaptation procedure can improve out-of-distribution performance for synthetic and real-world datasets.
arXiv Detail & Related papers (2023-12-29T04:15:58Z)
Aggregation Weighting of Federated Learning via Generalization Bound Estimation [65.8630966842025]
Federated Learning (FL) typically aggregates client model parameters using a weighting approach determined by sample proportions. We replace the aforementioned weighting method with a new strategy that considers the generalization bounds of each local model.
arXiv Detail & Related papers (2023-11-10T08:50:28Z)
FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms for Federated Learning [1.4656078321003647]
Federated learning (FL) is a decentralized machine learning approach where independent learners process data privately. We study the currently popular data partitioning techniques and visualize their main disadvantages. We propose a method that leverages entropy and symmetry to construct 'the most challenging' and controllable data distributions.
arXiv Detail & Related papers (2023-10-11T18:39:08Z)
Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data. Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z)
Bayesian Federated Neural Matching that Completes Full Information [2.6566593102111473]
Federated learning is a machine learning paradigm where locally trained models are distilled into a global model. We propose a novel approach that overcomes this flaw by introducing a Kullback-Leibler divergence penalty at each iteration.
arXiv Detail & Related papers (2022-11-15T09:47:56Z)
Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z)
Federated Learning Aggregation: New Robust Algorithms with Guarantees [63.96013144017572]
Federated learning has been recently proposed for distributed model training at the edge. This paper presents a complete general mathematical convergence analysis to evaluate aggregation strategies in a federated learning framework. We derive novel aggregation algorithms which are able to modify their model architecture by differentiating client contributions according to the value of their losses.
arXiv Detail & Related papers (2022-05-22T16:37:53Z)
Learning to Solve Routing Problems via Distributionally Robust Optimization [14.506553345693536]
Recent deep models for solving routing problems assume a single distribution of nodes for training, which severely impairs their cross-distribution generalization ability. We exploit group distributionally robust optimization (group DRO) to tackle this issue, where we jointly optimize the weights for different groups of distributions and the parameters for the deep model in an interleaved manner during training. We also design a module based on convolutional neural network, which allows the deep model to learn more informative latent pattern among the nodes.
arXiv Detail & Related papers (2022-02-15T08:06:44Z)
Out-of-distribution Generalization via Partial Feature Decorrelation [72.96261704851683]
We present a novel Partial Feature Decorrelation Learning (PFDL) algorithm, which jointly optimize a feature decomposition network and the target image classification model. The experiments on real-world datasets demonstrate that our method can improve the backbone model's accuracy on OOD image classification datasets.
arXiv Detail & Related papers (2020-07-30T05:48:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.