Representing Model Weights with Language using Tree Experts
- URL: http://arxiv.org/abs/2410.13569v1
- Date: Thu, 17 Oct 2024 17:17:09 GMT
- Title: Representing Model Weights with Language using Tree Experts
- Authors: Eliahu Horwitz, Bar Cavia, Jonathan Kahana, Yedid Hoshen,
- Abstract summary: This paper learns to represent models within a joint space that embeds both model weights and language.
We introduce Probing Experts (ProbeX), a theoretically motivated, lightweight probing method.
Our results show that ProbeX can effectively map the weights of large models into a shared weight-language embedding space.
- Score: 39.90685550999956
- License:
- Abstract: The increasing availability of public models begs the question: can we train neural networks that use other networks as input? This paper learns to represent models within a joint space that embeds both model weights and language. However, machine learning on model weights is challenging as model weights often exhibit significant variation unrelated to the models' semantic properties (nuisance variation). We identify a key property of real-world models: most public models belong to a small set of Model Trees, where all models within a tree are fine-tuned from a common ancestor (e.g., a foundation model). Importantly, we find that within each tree there is less nuisance variation between models. For example, while classifying models according to their training dataset generally requires complex architectures, in our case, even a linear classifier trained on a single layer is often effective. While effective, linear layers are computationally expensive as model weights are very high dimensional. To address this, we introduce Probing Experts (ProbeX), a theoretically motivated, lightweight probing method. Notably, ProbeX is the first probing method designed to learn from the weights of just a single model layer. We also construct and release a dataset that simulates the structure of public model repositories. Our results show that ProbeX can effectively map the weights of large models into a shared weight-language embedding space. Furthermore, we demonstrate the impressive generalization of our method, achieving zero-shot model classification and retrieval.
Related papers
- On the Origin of Llamas: Model Tree Heritage Recovery [39.08927346274156]
We introduce the task of Model Tree Heritage Recovery (MoTHer Recovery) for discovering Model Trees in neural networks.
Our hypothesis is that model weights encode this information, the challenge is to decode the underlying tree structure given the weights.
MoTHer recovery holds exciting long-term applications akin to indexing the internet by search engines.
arXiv Detail & Related papers (2024-05-28T17:59:51Z) - Initializing Models with Larger Ones [76.41561758293055]
We introduce weight selection, a method for initializing smaller models by selecting a subset of weights from a pretrained larger model.
Our experiments demonstrate that weight selection can significantly enhance the performance of small models and reduce their training time.
arXiv Detail & Related papers (2023-11-30T18:58:26Z) - Knowledge is a Region in Weight Space for Fine-tuned Language Models [48.589822853418404]
We study how the weight space and the underlying loss landscape of different models are interconnected.
We show that language models that have been finetuned on the same dataset form a tight cluster in the weight space, while models finetuned on different datasets from the same underlying task form a looser cluster.
arXiv Detail & Related papers (2023-02-09T18:59:18Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Revealing Secrets From Pre-trained Models [2.0249686991196123]
Transfer-learning has been widely adopted in many emerging deep learning algorithms.
We show that pre-trained models and fine-tuned models have significantly high similarities in weight values.
We propose a new model extraction attack that reveals the model architecture and the pre-trained model used by the black-box victim model.
arXiv Detail & Related papers (2022-07-19T20:19:03Z) - Neural Basis Models for Interpretability [33.51591891812176]
Generalized Additive Models (GAMs) are an inherently interpretable class of models.
We propose an entirely new subfamily of GAMs that utilize basis decomposition of shape functions.
A small number of basis functions are shared among all features, and are learned jointly for a given task.
arXiv Detail & Related papers (2022-05-27T17:31:19Z) - Transfer training from smaller language model [6.982133308738434]
We find a method to save training time and resource cost by changing the small well-trained model to large model.
We test the target model on several data sets and find it is still comparable with the source model.
arXiv Detail & Related papers (2021-04-23T02:56:02Z) - When Ensembling Smaller Models is More Efficient than Single Large
Models [52.38997176317532]
We show that ensembles can outperform single models with both higher accuracy and requiring fewer total FLOPs to compute.
This presents an interesting observation that output diversity in ensembling can often be more efficient than training larger models.
arXiv Detail & Related papers (2020-05-01T18:56:18Z) - BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage
Models [59.95091850331499]
We propose BigNAS, an approach that challenges the conventional wisdom that post-processing of the weights is necessary to get good prediction accuracies.
Our discovered model family, BigNASModels, achieve top-1 accuracies ranging from 76.5% to 80.9%.
arXiv Detail & Related papers (2020-03-24T23:00:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.