On the Origin of Llamas: Model Tree Heritage Recovery
- URL: http://arxiv.org/abs/2405.18432v1
- Date: Tue, 28 May 2024 17:59:51 GMT
- Title: On the Origin of Llamas: Model Tree Heritage Recovery
- Authors: Eliahu Horwitz, Asaf Shul, Yedid Hoshen,
- Abstract summary: We introduce the task of Model Tree Heritage Recovery (MoTHer Recovery) for discovering Model Trees in neural networks.
Our hypothesis is that model weights encode this information, the challenge is to decode the underlying tree structure given the weights.
MoTHer recovery holds exciting long-term applications akin to indexing the internet by search engines.
- Score: 39.08927346274156
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid growth of neural network models shared on the internet has made model weights an important data modality. However, this information is underutilized as the weights are uninterpretable, and publicly available models are disorganized. Inspired by Darwin's tree of life, we define the Model Tree which describes the origin of models i.e., the parent model that was used to fine-tune the target model. Similarly to the natural world, the tree structure is unknown. In this paper, we introduce the task of Model Tree Heritage Recovery (MoTHer Recovery) for discovering Model Trees in the ever-growing universe of neural networks. Our hypothesis is that model weights encode this information, the challenge is to decode the underlying tree structure given the weights. Beyond the immediate application of model authorship attribution, MoTHer recovery holds exciting long-term applications akin to indexing the internet by search engines. Practically, for each pair of models, this task requires: i) determining if they are related, and ii) establishing the direction of the relationship. We find that certain distributional properties of the weights evolve monotonically during training, which enables us to classify the relationship between two given models. MoTHer recovery reconstructs entire model hierarchies, represented by a directed tree, where a parent model gives rise to multiple child models through additional training. Our approach successfully reconstructs complex Model Trees, as well as the structure of "in-the-wild" model families such as Llama 2 and Stable Diffusion.
Related papers
- Representing Model Weights with Language using Tree Experts [39.90685550999956]
This paper learns to represent models within a joint space that embeds both model weights and language.
We introduce Probing Experts (ProbeX), a theoretically motivated, lightweight probing method.
Our results show that ProbeX can effectively map the weights of large models into a shared weight-language embedding space.
arXiv Detail & Related papers (2024-10-17T17:17:09Z) - Forecasting with Hyper-Trees [50.72190208487953]
Hyper-Trees are designed to learn the parameters of time series models.
By relating the parameters of a target time series model to features, Hyper-Trees also address the issue of parameter non-stationarity.
In this novel approach, the trees first generate informative representations from the input features, which a shallow network then maps to the target model parameters.
arXiv Detail & Related papers (2024-05-13T15:22:15Z) - Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment [69.33930972652594]
We propose a novel structural pruning approach to jointly learn the weights and structurally prune architectures of CNN models.
The core element of our method is a Reinforcement Learning (RL) agent whose actions determine the pruning ratios of the CNN model's layers.
We conduct the joint training and pruning by iteratively training the model's weights and the agent's policy.
arXiv Detail & Related papers (2024-03-28T15:22:29Z) - Learning to Jump: Thinning and Thickening Latent Counts for Generative
Modeling [69.60713300418467]
Learning to jump is a general recipe for generative modeling of various types of data.
We demonstrate when learning to jump is expected to perform comparably to learning to denoise, and when it is expected to perform better.
arXiv Detail & Related papers (2023-05-28T05:38:28Z) - DeepTree: Modeling Trees with Situated Latents [8.372189962601073]
We propose a novel method for modeling trees based on learning developmental rules for branching structures instead of manually defining them.
We call our deep neural model situated latent because its behavior is determined by the intrinsic state.
Our method enables generating a wide variety of tree shapes without the need to define intricate parameters.
arXiv Detail & Related papers (2023-05-09T03:33:14Z) - Robust estimation of tree structured models [0.0]
We show that it is possible to recover trees from noisy binary data up to a small equivalence class of possible trees.
We also provide a characterisation of when the Chow-Liu algorithm consistently learns the underlying tree from the noisy data.
arXiv Detail & Related papers (2021-02-10T14:58:40Z) - Attentive Tree-structured Network for Monotonicity Reasoning [2.4366811507669124]
We develop an attentive tree-structured neural network for monotonicity reasoning.
It is designed to model the syntactic parse tree information from the sentence pair of a reasoning task.
A self-attentive aggregator is used for aligning the representations of the premise and the hypothesis.
arXiv Detail & Related papers (2021-01-03T01:29:48Z) - Growing Deep Forests Efficiently with Soft Routing and Learned
Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects.
We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.
Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z) - Tensor Decompositions in Recursive Neural Networks for Tree-Structured
Data [12.069862650316262]
We introduce two new aggregation functions to encode structural knowledge from tree-structured data.
We test them on two tree classification tasks, showing the advantage of proposed models when tree outdegree increases.
arXiv Detail & Related papers (2020-06-18T15:40:32Z) - Exploiting Syntactic Structure for Better Language Modeling: A Syntactic
Distance Approach [78.77265671634454]
We make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called "syntactic distances"
Experimental results on the Penn Treebank and Chinese Treebank datasets show that when ground truth parse trees are provided as additional training signals, the model is able to achieve lower perplexity and induce trees with better quality.
arXiv Detail & Related papers (2020-05-12T15:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.