Multi-Stage Influence Function
- URL: http://arxiv.org/abs/2007.09081v1
- Date: Fri, 17 Jul 2020 16:03:11 GMT
- Title: Multi-Stage Influence Function
- Authors: Hongge Chen, Si Si, Yang Li, Ciprian Chelba, Sanjiv Kumar, Duane
Boning, Cho-Jui Hsieh
- Abstract summary: We develop a multi-stage influence function score to track predictions from a finetuned model all the way back to the pretraining data.
We study two different scenarios with the pretrained embeddings fixed or updated in the finetuning tasks.
- Score: 97.19210942277354
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-stage training and knowledge transfer, from a large-scale pretraining
task to various finetuning tasks, have revolutionized natural language
processing and computer vision resulting in state-of-the-art performance
improvements. In this paper, we develop a multi-stage influence function score
to track predictions from a finetuned model all the way back to the pretraining
data. With this score, we can identify the pretraining examples in the
pretraining task that contribute most to a prediction in the finetuning task.
The proposed multi-stage influence function generalizes the original influence
function for a single model in (Koh & Liang, 2017), thereby enabling influence
computation through both pretrained and finetuned models. We study two
different scenarios with the pretrained embeddings fixed or updated in the
finetuning tasks. We test our proposed method in various experiments to show
its effectiveness and potential applications.
Related papers
- DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning [75.68193159293425]
In-context learning (ICL) allows transformer-based language models to learn a specific task with a few "task demonstrations" without updating their parameters.
We propose an influence function-based attribution technique, DETAIL, that addresses the specific characteristics of ICL.
We experimentally prove the wide applicability of DETAIL by showing our attribution scores obtained on white-box models are transferable to black-box models in improving model performance.
arXiv Detail & Related papers (2024-05-22T15:52:52Z) - LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views [28.081794908107604]
Fine-tuning is used to leverage the power of pre-trained foundation models in new downstream tasks.
Recent studies have observed challenges in the generalization of fine-tuned models to unseen distributions.
We propose a novel generalizable fine-tuning method LEVI, where the pre-trained model is adaptively ensembled layer-wise with a small task-specific model.
arXiv Detail & Related papers (2024-02-07T08:16:40Z) - An Emulator for Fine-Tuning Large Language Models using Small Language
Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales.
We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.
Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z) - Learning to Modulate pre-trained Models in RL [22.812215561012874]
Fine-tuning a pre-trained model often suffers from catastrophic forgetting.
Our study shows that with most fine-tuning approaches, the performance on pre-training tasks deteriorates significantly.
We propose a novel method, Learning-to-Modulate (L2M), that avoids the degradation of learned skills by modulating the information flow of the frozen pre-trained model.
arXiv Detail & Related papers (2023-06-26T17:53:05Z) - On the Trade-off of Intra-/Inter-class Diversity for Supervised
Pre-training [72.8087629914444]
We study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset.
With the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity.
arXiv Detail & Related papers (2023-05-20T16:23:50Z) - Task Formulation Matters When Learning Continually: A Case Study in
Visual Question Answering [58.82325933356066]
Continual learning aims to train a model incrementally on a sequence of tasks without forgetting previous knowledge.
We present a detailed study of how different settings affect performance for Visual Question Answering.
arXiv Detail & Related papers (2022-09-30T19:12:58Z) - Exploring Example Influence in Continual Learning [26.85320841575249]
Continual Learning (CL) sequentially learns new tasks like human beings, with the goal to achieve better Stability (S) and Plasticity (P)
It is valuable to explore the influence difference on S and P among training examples, which may improve the learning pattern towards better SP.
We propose a simple yet effective MetaSP algorithm to simulate the two key steps in the perturbation of IF and obtain the S- and P-aware example influence.
arXiv Detail & Related papers (2022-09-25T15:17:37Z) - Fine-Tuning Pretrained Language Models: Weight Initializations, Data
Orders, and Early Stopping [62.78338049381917]
Fine-tuning pretrained contextual word embedding models to supervised downstream tasks has become commonplace in natural language processing.
We experiment with four datasets from the GLUE benchmark, fine-tuning BERT hundreds of times on each while varying only the random seeds.
We find substantial performance increases compared to previously reported results, and we quantify how the performance of the best-found model varies as a function of the number of fine-tuning trials.
arXiv Detail & Related papers (2020-02-15T02:40:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.