Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model
- URL: http://arxiv.org/abs/2310.17653v2
- Date: Mon, 26 Feb 2024 18:58:43 GMT
- Title: Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model
- Authors: Karsten Roth, Lukas Thede, Almut Sophia Koepke, Oriol Vinyals, Olivier
H\'enaff, Zeynep Akata
- Abstract summary: We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
- Score: 74.62272538148245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training deep networks requires various design decisions regarding for
instance their architecture, data augmentation, or optimization. In this work,
we find these training variations to result in networks learning unique feature
sets from the data. Using public model libraries comprising thousands of models
trained on canonical datasets like ImageNet, we observe that for arbitrary
pairings of pretrained models, one model extracts significant data context
unavailable in the other -- independent of overall performance. Given any
arbitrary pairing of pretrained models and no external rankings (such as
separate test sets, e.g. due to data privacy), we investigate if it is possible
to transfer such "complementary" knowledge from one model to another without
performance degradation -- a task made particularly difficult as additional
knowledge can be contained in stronger, equiperformant or weaker models. Yet
facilitating robust transfer in scenarios agnostic to pretrained model pairings
would unlock auxiliary gains and knowledge fusion from any model repository
without restrictions on model and problem specifics - including from weaker,
lower-performance models. This work therefore provides an initial, in-depth
exploration on the viability of such general-purpose knowledge transfer. Across
large-scale experiments, we first reveal the shortcomings of standard knowledge
distillation techniques, and then propose a much more general extension through
data partitioning for successful transfer between nearly all pretrained models,
which we show can also be done unsupervised. Finally, we assess both the
scalability and impact of fundamental model properties on successful
model-agnostic knowledge transfer.
Related papers
- Complementary Learning for Real-World Model Failure Detection [15.779651238128562]
We introduce complementary learning, where we use learned characteristics from different training paradigms to detect model errors.
We demonstrate our approach by learning semantic and predictive motion labels in point clouds in a supervised and self-supervised manner.
We perform a large-scale qualitative analysis and present LidarCODA, the first dataset with labeled anomalies in lidar point clouds.
arXiv Detail & Related papers (2024-07-19T13:36:35Z) - Encapsulating Knowledge in One Prompt [56.31088116526825]
KiOP encapsulates knowledge from various models into a solitary prompt without altering the original models or requiring access to the training data.
From a practicality standpoint, this paradigm proves the effectiveness of Visual Prompt in data inaccessible contexts.
Experiments across various datasets and models demonstrate the efficacy of the proposed KiOP knowledge transfer paradigm.
arXiv Detail & Related papers (2024-07-16T16:35:23Z) - Pre-trained Recommender Systems: A Causal Debiasing Perspective [19.712997823535066]
We develop a generic recommender that captures universal interaction patterns by training on generic user-item interaction data extracted from different domains.
Our empirical studies show that the proposed model could significantly improve the recommendation performance in zero- and few-shot learning settings.
arXiv Detail & Related papers (2023-10-30T03:37:32Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models [89.44031286278347]
We propose a Hub-Pathway framework to enable knowledge transfer from a model hub.
The proposed framework can be trained end-to-end with the target task-specific loss.
Experiment results on computer vision and reinforcement learning tasks demonstrate that the framework achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-06-08T08:00:12Z) - Exploring Strategies for Generalizable Commonsense Reasoning with
Pre-trained Models [62.28551903638434]
We measure the impact of three different adaptation methods on the generalization and accuracy of models.
Experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers.
We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.
arXiv Detail & Related papers (2021-09-07T03:13:06Z) - End-to-End Weak Supervision [15.125993628007972]
We propose an end-to-end approach for directly learning the downstream model.
We show improved performance over prior work in terms of end model performance on downstream test sets.
arXiv Detail & Related papers (2021-07-05T19:10:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.