Learn From Model Beyond Fine-Tuning: A Survey
- URL: http://arxiv.org/abs/2310.08184v1
- Date: Thu, 12 Oct 2023 10:20:36 GMT
- Title: Learn From Model Beyond Fine-Tuning: A Survey
- Authors: Hongling Zheng, Li Shen, Anke Tang, Yong Luo, Han Hu, Bo Du, Dacheng
Tao
- Abstract summary: Learn From Model (LFM) focuses on the research, modification, and design of foundation models (FM) based on the model interface.
The study of LFM techniques can be broadly categorized into five major areas: model tuning, model distillation, model reuse, meta learning and model editing.
This paper gives a comprehensive review of the current methods based on FM from the perspective of LFM.
- Score: 78.80920533793595
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models (FM) have demonstrated remarkable performance across a wide
range of tasks (especially in the fields of natural language processing and
computer vision), primarily attributed to their ability to comprehend
instructions and access extensive, high-quality data. This not only showcases
their current effectiveness but also sets a promising trajectory towards the
development of artificial general intelligence. Unfortunately, due to multiple
constraints, the raw data of the model used for large model training are often
inaccessible, so the use of end-to-end models for downstream tasks has become a
new research trend, which we call Learn From Model (LFM) in this article. LFM
focuses on the research, modification, and design of FM based on the model
interface, so as to better understand the model structure and weights (in a
black box environment), and to generalize the model to downstream tasks. The
study of LFM techniques can be broadly categorized into five major areas: model
tuning, model distillation, model reuse, meta learning and model editing. Each
category encompasses a repertoire of methods and strategies that aim to enhance
the capabilities and performance of FM. This paper gives a comprehensive review
of the current methods based on FM from the perspective of LFM, in order to
help readers better understand the current research status and ideas. To
conclude, we summarize the survey by highlighting several critical areas for
future exploration and addressing open issues that require further attention
from the research community. The relevant papers we investigated in this
article can be accessed at
<https://github.com/ruthless-man/Awesome-Learn-from-Model>.
Related papers
- Specialized Foundation Models Struggle to Beat Supervised Baselines [60.23386520331143]
We look at three modalities -- genomics, satellite imaging, and time series -- with multiple recent FMs and compare them to a standard supervised learning workflow.
We find that it is consistently possible to train simple supervised models that match or even outperform the latest foundation models.
arXiv Detail & Related papers (2024-11-05T04:10:59Z) - Synergizing Foundation Models and Federated Learning: A Survey [23.416321895575507]
This paper discusses the potentials and challenges of synergizing Federated Learning (FL) and Foundation Models (FM)
FL is a collaborative learning paradigm that breaks the barrier of data availability from different participants.
It provides a promising solution to customize and adapt FMs to a wide range of domain-specific tasks using distributed datasets whilst preserving privacy.
arXiv Detail & Related papers (2024-06-18T17:58:09Z) - A Survey on Efficient Federated Learning Methods for Foundation Model Training [62.473245910234304]
Federated Learning (FL) has become an established technique to facilitate privacy-preserving collaborative training across a multitude of clients.
In the wake of Foundation Models (FM), the reality is different for many deep learning applications.
We discuss the benefits and drawbacks of parameter-efficient fine-tuning (PEFT) for FL applications.
arXiv Detail & Related papers (2024-01-09T10:22:23Z) - Has Your Pretrained Model Improved? A Multi-head Posterior Based
Approach [25.927323251675386]
We leverage the meta-features associated with each entity as a source of worldly knowledge and employ entity representations from the models.
We propose using the consistency between these representations and the meta-features as a metric for evaluating pre-trained models.
Our method's effectiveness is demonstrated across various domains, including models with relational datasets, large language models and image models.
arXiv Detail & Related papers (2024-01-02T17:08:26Z) - MinT: Boosting Generalization in Mathematical Reasoning via Multi-View
Fine-Tuning [53.90744622542961]
Reasoning in mathematical domains remains a significant challenge for small language models (LMs)
We introduce a new method that exploits existing mathematical problem datasets with diverse annotation styles.
Experimental results show that our strategy enables a LLaMA-7B model to outperform prior approaches.
arXiv Detail & Related papers (2023-07-16T05:41:53Z) - Scaling Vision-Language Models with Sparse Mixture of Experts [128.0882767889029]
We show that mixture-of-experts (MoE) techniques can achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost.
Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vision-language models.
arXiv Detail & Related papers (2023-03-13T16:00:31Z) - Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey [66.18478838828231]
Multi-modal pre-trained big models have drawn more and more attention in recent years.
This paper introduces the background of multi-modal pre-training by reviewing the conventional deep, pre-training works in natural language process, computer vision, and speech.
Then, we introduce the task definition, key challenges, and advantages of multi-modal pre-training models (MM-PTMs), and discuss the MM-PTMs with a focus on data, objectives, network, and knowledge enhanced pre-training.
arXiv Detail & Related papers (2023-02-20T15:34:03Z) - Foundation models in brief: A historical, socio-technical focus [2.5991265608180396]
Foundation models can be disruptive for future AI development by scaling up deep learning.
Models achieve state-of-the-art performance on a variety of tasks in domains such as natural language processing and computer vision.
arXiv Detail & Related papers (2022-12-17T22:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.