ModelGiF: Gradient Fields for Model Functional Distance
- URL: http://arxiv.org/abs/2309.11013v1
- Date: Wed, 20 Sep 2023 02:27:40 GMT
- Title: ModelGiF: Gradient Fields for Model Functional Distance
- Authors: Jie Song, Zhengqi Xu, Sai Wu, Gang Chen, Mingli Song
- Abstract summary: We introduce Model Gradient Field (abbr. ModelGiF) to extract homogeneous representations from pre-trained models.
Our main assumption is that each pre-trained deep model uniquely determines a ModelGiF over the input space.
We validate the effectiveness of the proposed ModelGiF with a suite of testbeds, including task relatedness estimation, intellectual property protection, and model unlearning verification.
- Score: 45.183991610710045
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The last decade has witnessed the success of deep learning and the surge of
publicly released trained models, which necessitates the quantification of the
model functional distance for various purposes. However, quantifying the model
functional distance is always challenging due to the opacity in inner workings
and the heterogeneity in architectures or tasks. Inspired by the concept of
"field" in physics, in this work we introduce Model Gradient Field (abbr.
ModelGiF) to extract homogeneous representations from the heterogeneous
pre-trained models. Our main assumption underlying ModelGiF is that each
pre-trained deep model uniquely determines a ModelGiF over the input space. The
distance between models can thus be measured by the similarity between their
ModelGiFs. We validate the effectiveness of the proposed ModelGiF with a suite
of testbeds, including task relatedness estimation, intellectual property
protection, and model unlearning verification. Experimental results demonstrate
the versatility of the proposed ModelGiF on these tasks, with significantly
superiority performance to state-of-the-art competitors. Codes are available at
https://github.com/zju-vipa/modelgif.
Related papers
- EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - MGE: A Training-Free and Efficient Model Generation and Enhancement
Scheme [10.48591131837771]
This paper proposes a Training-Free and Efficient Model Generation and Enhancement Scheme (MGE)
It considers two aspects during the model generation process: the distribution of model parameters and model performance.
Experiments result shows that generated models are comparable to models obtained through normal training, and even superior in some cases.
arXiv Detail & Related papers (2024-02-27T13:12:00Z) - Has Your Pretrained Model Improved? A Multi-head Posterior Based
Approach [25.927323251675386]
We leverage the meta-features associated with each entity as a source of worldly knowledge and employ entity representations from the models.
We propose using the consistency between these representations and the meta-features as a metric for evaluating pre-trained models.
Our method's effectiveness is demonstrated across various domains, including models with relational datasets, large language models and image models.
arXiv Detail & Related papers (2024-01-02T17:08:26Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Model Provenance via Model DNA [23.885185988451667]
We introduce a novel concept of Model DNA which represents the unique characteristics of a machine learning model.
We develop an efficient framework for model provenance identification, which enables us to identify whether a source model is a pre-training model of a target model.
arXiv Detail & Related papers (2023-08-04T03:46:41Z) - Revisiting Implicit Models: Sparsity Trade-offs Capability in
Weight-tied Model for Vision Tasks [4.872984658007499]
Implicit models such as Deep Equilibrium Models (DEQs) have garnered significant attention in the community for their ability to train infinite layer models.
We revisit the line of implicit models and trace them back to the original weight-tied models.
Surprisingly, we observe that weight-tied models are more effective, stable, as well as efficient on vision tasks, compared to the DEQ variants.
arXiv Detail & Related papers (2023-07-16T11:45:35Z) - Towards Efficient Task-Driven Model Reprogramming with Foundation Models [52.411508216448716]
Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data.
However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations.
This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task.
arXiv Detail & Related papers (2023-04-05T07:28:33Z) - Interpretable ODE-style Generative Diffusion Model via Force Field
Construction [0.0]
This paper aims to identify various physical models that are suitable for constructing ODE-style generative diffusion models accurately from a mathematical perspective.
We perform a case study where we use the theoretical model identified by our method to develop a range of new diffusion model methods.
arXiv Detail & Related papers (2023-03-14T16:58:11Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.