Sharing Knowledge without Sharing Data: Stitches can improve ensembles of disjointly trained models
- URL: http://arxiv.org/abs/2512.17592v1
- Date: Fri, 19 Dec 2025 13:59:46 GMT
- Title: Sharing Knowledge without Sharing Data: Stitches can improve ensembles of disjointly trained models
- Authors: Arthur Guijt, Dirk Thierens, Ellen Kerkhof, Jan Wiersma, Tanja Alderliesten, Peter A. N. Bosman,
- Abstract summary: In some settings, like in the medical domain, data is often fragmented across parties, and cannot be readily shared.<n>We investigate how asynchronous collaboration affects performance, and propose to use stitching as a method for combining models.<n>We find that combining intermediate representations in individually trained models with a well placed pair of stitching layers allows this performance to recover to a competitive degree.
- Score: 0.9851812512860351
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning has been shown to be very capable at performing many real-world tasks. However, this performance is often dependent on the presence of large and varied datasets. In some settings, like in the medical domain, data is often fragmented across parties, and cannot be readily shared. While federated learning addresses this situation, it is a solution that requires synchronicity of parties training a single model together, exchanging information about model weights. We investigate how asynchronous collaboration, where only already trained models are shared (e.g. as part of a publication), affects performance, and propose to use stitching as a method for combining models. Through taking a multi-objective perspective, where performance on each parties' data is viewed independently, we find that training solely on a single parties' data results in similar performance when merging with another parties' data, when considering performance on that single parties' data, while performance on other parties' data is notably worse. Moreover, while an ensemble of such individually trained networks generalizes better, performance on each parties' own dataset suffers. We find that combining intermediate representations in individually trained models with a well placed pair of stitching layers allows this performance to recover to a competitive degree while maintaining improved generalization, showing that asynchronous collaboration can yield competitive results.
Related papers
- Scalable Data Ablation Approximations for Language Models through Modular Training and Merging [27.445079398772904]
We propose an efficient method for approximating data ablations which trains individual models on subsets of a training corpus.
We find that, given an arbitrary evaluation set, the perplexity score of a single model trained on a candidate set of data is strongly correlated with perplexity scores of parameter averages of models trained on distinct partitions of that data.
arXiv Detail & Related papers (2024-10-21T06:03:49Z) - How to Collaborate: Towards Maximizing the Generalization Performance in Cross-Silo Federated Learning [11.442808208742758]
Federated clustering (FL) has vivid attention as a privacy-preserving distributed learning framework.<n>In this work, we focus on cross-silo FL, where clients become the model owners after FL data.<n>We formulate that the performance of a client can be improved only by collaborating with other clients that have more training data.
arXiv Detail & Related papers (2024-01-24T05:41:34Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Joint Training of Deep Ensembles Fails Due to Learner Collusion [61.557412796012535]
Ensembles of machine learning models have been well established as a powerful method of improving performance over a single model.
Traditionally, ensembling algorithms train their base learners independently or sequentially with the goal of optimizing their joint performance.
We show that directly minimizing the loss of the ensemble appears to rarely be applied in practice.
arXiv Detail & Related papers (2023-01-26T18:58:07Z) - Striving for data-model efficiency: Identifying data externalities on
group performance [75.17591306911015]
Building trustworthy, effective, and responsible machine learning systems hinges on understanding how differences in training data and modeling decisions interact to impact predictive performance.
We focus on a particular type of data-model inefficiency, in which adding training data from some sources can actually lower performance evaluated on key sub-groups of the population.
Our results indicate that data-efficiency is a key component of both accurate and trustworthy machine learning.
arXiv Detail & Related papers (2022-11-11T16:48:27Z) - Combining Data-driven Supervision with Human-in-the-loop Feedback for
Entity Resolution [47.90125404360125]
We build a model that identifies and consolidates data points that represent the same person.
In this case study, we discuss our human-in-the-loop enabled, data-centric solution to closing the training-production performance divergence.
arXiv Detail & Related papers (2021-11-20T02:22:12Z) - Personalised Federated Learning: A Combinational Approach [10.204907134342637]
Federated learning (FL) is a distributed machine learning approach involving multiple clients collaboratively training a shared model.
Privacy and integrity preserving features such as differential privacy (DP) and robust aggregation (RA) are commonly used in FL.
In this work, we show that on common deep learning tasks, the performance of FL models differs amongst clients and situations.
arXiv Detail & Related papers (2021-08-22T02:11:20Z) - Federated Mixture of Experts [94.25278695272874]
FedMix is a framework that allows us to train an ensemble of specialized models.
We show that users with similar data characteristics select the same members and therefore share statistical strength.
arXiv Detail & Related papers (2021-07-14T14:15:24Z) - Federated Residual Learning [53.77128418049985]
We study a new form of federated learning where the clients train personalized local models and make predictions jointly with the server-side shared model.
Using this new federated learning framework, the complexity of the central shared model can be minimized while still gaining all the performance benefits that joint training provides.
arXiv Detail & Related papers (2020-03-28T19:55:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.