Building Accurate Simple Models with Multihop
- URL: http://arxiv.org/abs/2109.06961v1
- Date: Tue, 14 Sep 2021 20:39:11 GMT
- Title: Building Accurate Simple Models with Multihop
- Authors: Amit Dhurandhar and Tejaswini Pedapati
- Abstract summary: We propose a meta-approach where we transfer information from the complex model to the simple model.
Our approach can transfer information between consecutive models in the sequence using any of the previously mentioned approaches.
In the experiments on real data, we observe that we get consistent gains for different choices of models over 1-hop.
- Score: 13.182955266765653
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Knowledge transfer from a complex high performing model to a simpler and
potentially low performing one in order to enhance its performance has been of
great interest over the last few years as it finds applications in important
problems such as explainable artificial intelligence, model compression, robust
model building and learning from small data. Known approaches to this problem
(viz. Knowledge Distillation, Model compression, ProfWeight, etc.) typically
transfer information directly (i.e. in a single/one hop) from the complex model
to the chosen simple model through schemes that modify the target or reweight
training examples on which the simple model is trained. In this paper, we
propose a meta-approach where we transfer information from the complex model to
the simple model by dynamically selecting and/or constructing a sequence of
intermediate models of decreasing complexity that are less intricate than the
original complex model. Our approach can transfer information between
consecutive models in the sequence using any of the previously mentioned
approaches as well as work in 1-hop fashion, thus generalizing these
approaches. In the experiments on real data, we observe that we get consistent
gains for different choices of models over 1-hop, which on average is more than
2\% and reaches up to 8\% in a particular case. We also empirically analyze
conditions under which the multi-hop approach is likely to be beneficial over
the traditional 1-hop approach, and report other interesting insights. To the
best of our knowledge, this is the first work that proposes such a multi-hop
approach to perform knowledge transfer given a single high performing complex
model, making it in our opinion, an important methodological contribution.
Related papers
- Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Understanding Parameter Sharing in Transformers [53.75988363281843]
Previous work on Transformers has focused on sharing parameters in different layers, which can improve the performance of models with limited parameters by increasing model depth.
We show that the success of this approach can be largely attributed to better convergence, with only a small part due to the increased model complexity.
Experiments on 8 machine translation tasks show that our model achieves competitive performance with only half the model complexity of parameter sharing models.
arXiv Detail & Related papers (2023-06-15T10:48:59Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Model ensemble instead of prompt fusion: a sample-specific knowledge
transfer method for few-shot prompt tuning [85.55727213502402]
We focus on improving the few-shot performance of prompt tuning by transferring knowledge from soft prompts of source tasks.
We propose Sample-specific Ensemble of Source Models (SESoM)
SESoM learns to adjust the contribution of each source model for each target sample separately when ensembling source model outputs.
arXiv Detail & Related papers (2022-10-23T01:33:16Z) - Merging Models with Fisher-Weighted Averaging [24.698591753644077]
We introduce a fundamentally different method for transferring knowledge across models that amounts to "merging" multiple models into one.
Our approach effectively involves computing a weighted average of the models' parameters.
We show that our merging procedure makes it possible to combine models in previously unexplored ways.
arXiv Detail & Related papers (2021-11-18T17:59:35Z) - Model-agnostic multi-objective approach for the evolutionary discovery
of mathematical models [55.41644538483948]
In modern data science, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results.
We use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties.
arXiv Detail & Related papers (2021-07-07T11:17:09Z) - Sample Efficient Reinforcement Learning via Model-Ensemble Exploration
and Exploitation [3.728946517493471]
MEEE is a model-ensemble method that consists of optimistic exploration and weighted exploitation.
Our approach outperforms other model-free and model-based state-of-the-art methods, especially in sample complexity.
arXiv Detail & Related papers (2021-07-05T07:18:20Z) - When Ensembling Smaller Models is More Efficient than Single Large
Models [52.38997176317532]
We show that ensembles can outperform single models with both higher accuracy and requiring fewer total FLOPs to compute.
This presents an interesting observation that output diversity in ensembling can often be more efficient than training larger models.
arXiv Detail & Related papers (2020-05-01T18:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.