Related papers: On the Origin of Llamas: Model Tree Heritage Recovery

On the Origin of Llamas: Model Tree Heritage Recovery

URL: http://arxiv.org/abs/2405.18432v1
Date: Tue, 28 May 2024 17:59:51 GMT
Title: On the Origin of Llamas: Model Tree Heritage Recovery
Authors: Eliahu Horwitz, Asaf Shul, Yedid Hoshen,
Abstract summary: We introduce the task of Model Tree Heritage Recovery (MoTHer Recovery) for discovering Model Trees in neural networks. Our hypothesis is that model weights encode this information, the challenge is to decode the underlying tree structure given the weights. MoTHer recovery holds exciting long-term applications akin to indexing the internet by search engines.
Score: 39.08927346274156
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid growth of neural network models shared on the internet has made model weights an important data modality. However, this information is underutilized as the weights are uninterpretable, and publicly available models are disorganized. Inspired by Darwin's tree of life, we define the Model Tree which describes the origin of models i.e., the parent model that was used to fine-tune the target model. Similarly to the natural world, the tree structure is unknown. In this paper, we introduce the task of Model Tree Heritage Recovery (MoTHer Recovery) for discovering Model Trees in the ever-growing universe of neural networks. Our hypothesis is that model weights encode this information, the challenge is to decode the underlying tree structure given the weights. Beyond the immediate application of model authorship attribution, MoTHer recovery holds exciting long-term applications akin to indexing the internet by search engines. Practically, for each pair of models, this task requires: i) determining if they are related, and ii) establishing the direction of the relationship. We find that certain distributional properties of the weights evolve monotonically during training, which enables us to classify the relationship between two given models. MoTHer recovery reconstructs entire model hierarchies, represented by a directed tree, where a parent model gives rise to multiple child models through additional training. Our approach successfully reconstructs complex Model Trees, as well as the structure of "in-the-wild" model families such as Llama 2 and Stable Diffusion.

Related papers

We Should Chart an Atlas of All the World's Models [37.19719066562013]
We advocate for charting the world's model population in a unified structure we call the Model Atlas.<n>The Model Atlas enables applications in model forensics, meta-ML research, and model discovery.
arXiv Detail & Related papers (2025-03-13T17:59:53Z)
Representing Model Weights with Language using Tree Experts [39.90685550999956]
This paper learns to represent models within a joint space that embeds both model weights and language. We introduce Probing Experts (ProbeX), a theoretically motivated, lightweight probing method. Our results show that ProbeX can effectively map the weights of large models into a shared weight-language embedding space.
arXiv Detail & Related papers (2024-10-17T17:17:09Z)
Forecasting with Hyper-Trees [50.72190208487953]
Hyper-Trees are designed to learn the parameters of time series models. By relating the parameters of a target time series model to features, Hyper-Trees also address the issue of parameter non-stationarity. In this novel approach, the trees first generate informative representations from the input features, which a shallow network then maps to the target model parameters.
arXiv Detail & Related papers (2024-05-13T15:22:15Z)
Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data [49.73114504515852]
We show that replacing the original real data by each generation's synthetic data does indeed tend towards model collapse. We demonstrate that accumulating the successive generations of synthetic data alongside the original real data avoids model collapse.
arXiv Detail & Related papers (2024-04-01T18:31:24Z)
Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment [69.33930972652594]
We propose a novel structural pruning approach to jointly learn the weights and structurally prune architectures of CNN models. The core element of our method is a Reinforcement Learning (RL) agent whose actions determine the pruning ratios of the CNN model's layers. We conduct the joint training and pruning by iteratively training the model's weights and the agent's policy.
arXiv Detail & Related papers (2024-03-28T15:22:29Z)
Generative Forests [23.554594285885273]
We focus on generative AI for a type of data that still represent one of the most prevalent form of data: tabular data. Our paper introduces a new powerful class of forest-based models fit for such tasks and a simple training algorithm with strong convergence guarantees. Additional experiments on these tasks reveal that our models can be notably good contenders to diverse state of the art methods.
arXiv Detail & Related papers (2023-08-07T14:58:53Z)
Model Provenance via Model DNA [23.885185988451667]
We introduce a novel concept of Model DNA which represents the unique characteristics of a machine learning model. We develop an efficient framework for model provenance identification, which enables us to identify whether a source model is a pre-training model of a target model.
arXiv Detail & Related papers (2023-08-04T03:46:41Z)
Learning to Jump: Thinning and Thickening Latent Counts for Generative Modeling [69.60713300418467]
Learning to jump is a general recipe for generative modeling of various types of data. We demonstrate when learning to jump is expected to perform comparably to learning to denoise, and when it is expected to perform better.
arXiv Detail & Related papers (2023-05-28T05:38:28Z)
DeepTree: Modeling Trees with Situated Latents [8.372189962601073]
We propose a novel method for modeling trees based on learning developmental rules for branching structures instead of manually defining them. We call our deep neural model situated latent because its behavior is determined by the intrinsic state. Our method enables generating a wide variety of tree shapes without the need to define intricate parameters.
arXiv Detail & Related papers (2023-05-09T03:33:14Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence. We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z)
Robust estimation of tree structured models [0.0]
We show that it is possible to recover trees from noisy binary data up to a small equivalence class of possible trees. We also provide a characterisation of when the Chow-Liu algorithm consistently learns the underlying tree from the noisy data.
arXiv Detail & Related papers (2021-02-10T14:58:40Z)
When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data [84.87772675171412]
We study the circumstances under which explanations of individual data points can improve modeling performance. We make use of three existing datasets with explanations: e-SNLI, TACRED, SemEval.
arXiv Detail & Related papers (2021-02-03T18:57:08Z)
Attentive Tree-structured Network for Monotonicity Reasoning [2.4366811507669124]
We develop an attentive tree-structured neural network for monotonicity reasoning. It is designed to model the syntactic parse tree information from the sentence pair of a reasoning task. A self-attentive aggregator is used for aligning the representations of the premise and the hypothesis.
arXiv Detail & Related papers (2021-01-03T01:29:48Z)
Growing Deep Forests Efficiently with Soft Routing and Learned Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects. We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions. Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z)
Learnable Boundary Guided Adversarial Training [66.57846365425598]
We use the model logits from one clean model to guide learning of another one robust model. We achieve new state-of-the-art robustness on CIFAR-100 without additional real or synthetic data.
arXiv Detail & Related papers (2020-11-23T01:36:05Z)
Tensor Decompositions in Recursive Neural Networks for Tree-Structured Data [12.069862650316262]
We introduce two new aggregation functions to encode structural knowledge from tree-structured data. We test them on two tree classification tasks, showing the advantage of proposed models when tree outdegree increases.
arXiv Detail & Related papers (2020-06-18T15:40:32Z)
Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach [78.77265671634454]
We make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called "syntactic distances" Experimental results on the Penn Treebank and Chinese Treebank datasets show that when ground truth parse trees are provided as additional training signals, the model is able to achieve lower perplexity and induce trees with better quality.
arXiv Detail & Related papers (2020-05-12T15:35:00Z)
Model Reuse with Reduced Kernel Mean Embedding Specification [70.044322798187]
We present a two-phase framework for finding helpful models for a current application. In the upload phase, when a model is uploading into the pool, we construct a reduced kernel mean embedding (RKME) as a specification for the model. Then in the deployment phase, the relatedness of the current task and pre-trained models will be measured based on the value of the RKME specification.
arXiv Detail & Related papers (2020-01-20T15:15:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.