A Closer Look at How Fine-tuning Changes BERT
- URL: http://arxiv.org/abs/2106.14282v1
- Date: Sun, 27 Jun 2021 17:01:43 GMT
- Title: A Closer Look at How Fine-tuning Changes BERT
- Authors: Yichu Zhou and Vivek Srikumar
- Abstract summary: We study the English BERT family and use two probing techniques to analyze how fine-tuning changes the space.
Our experiments reveal that fine-tuning improves performance because it pushes points associated with a label away from other labels.
By comparing the representations before and after fine-tuning, we also discover that fine-tuning does not change the representations arbitrarily; instead, it adjusts the representations to downstream tasks while preserving the original structure.
- Score: 21.23284793831221
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given the prevalence of pre-trained contextualized representations in today's
NLP, there have been several efforts to understand what information such
representations contain. A common strategy to use such representations is to
fine-tune them for an end task. However, how fine-tuning for a task changes the
underlying space is less studied. In this work, we study the English BERT
family and use two probing techniques to analyze how fine-tuning changes the
space. Our experiments reveal that fine-tuning improves performance because it
pushes points associated with a label away from other labels. By comparing the
representations before and after fine-tuning, we also discover that fine-tuning
does not change the representations arbitrarily; instead, it adjusts the
representations to downstream tasks while preserving the original structure.
Finally, using carefully constructed experiments, we show that fine-tuning can
encode training sets in a representation, suggesting an overfitting problem of
a new kind.
Related papers
- PTP: Boosting Stability and Performance of Prompt Tuning with
Perturbation-Based Regularizer [94.23904400441957]
We introduce perturbation-based regularizers, which can smooth the loss landscape, into prompt tuning.
We design two kinds of perturbation-based regularizers, including random-noise-based and adversarial-based.
Our new algorithms improve the state-of-the-art prompt tuning methods by 1.94% and 2.34% on SuperGLUE and FewGLUE benchmarks, respectively.
arXiv Detail & Related papers (2023-05-03T20:30:51Z) - Improved Visual Fine-tuning with Natural Language Supervision [36.250244364023665]
Fine-tuning a visual pre-trained model can leverage the semantic information from large-scale pre-training data.
The problem of catastrophic forgetting in pre-trained backbone has been extensively studied for fine-tuning.
We introduce a reference distribution obtained from a fixed text classifier, which can help regularize the learned vision classifier.
arXiv Detail & Related papers (2023-04-04T03:08:02Z) - Understanding and Mitigating Overfitting in Prompt Tuning for
Vision-Language Models [108.13378788663196]
We propose Subspace Prompt Tuning (SubPT) to project the gradients in back-propagation onto the low-rank subspace spanned by the early-stage gradient flow eigenvectors during the entire training process.
We equip CoOp with Novel Learner Feature (NFL) to enhance the generalization ability of the learned prompts onto novel categories beyond the training set.
arXiv Detail & Related papers (2022-11-04T02:06:22Z) - Alleviating Representational Shift for Continual Fine-tuning [13.335957004592407]
We study a practical setting of continual learning: fine-tuning on a pre-trained model continually.
We propose ConFiT, a fine-tuning method incorporating two components, cross-convolution batch normalization (Xconv BN) and hierarchical fine-tuning.
Xconv BN maintains pre-convolution running means instead of post-convolution, and recovers post-convolution ones before testing, which corrects the inaccurate estimates of means under IRS.
arXiv Detail & Related papers (2022-04-22T06:58:20Z) - Fair Interpretable Learning via Correction Vectors [68.29997072804537]
We propose a new framework for fair representation learning centered around the learning of "correction vectors"
The corrections are then simply summed up to the original features, and can therefore be analyzed as an explicit penalty or bonus to each feature.
We show experimentally that a fair representation learning problem constrained in such a way does not impact performance.
arXiv Detail & Related papers (2022-01-17T10:59:33Z) - Probing as Quantifying the Inductive Bias of Pre-trained Representations [99.93552997506438]
We present a novel framework for probing where the goal is to evaluate the inductive bias of representations for a particular task.
We apply our framework to a series of token-, arc-, and sentence-level tasks.
arXiv Detail & Related papers (2021-10-15T22:01:16Z) - On the Interplay Between Fine-tuning and Sentence-level Probing for
Linguistic Knowledge in Pre-trained Transformers [24.858283637038422]
We study three different pre-trained models: BERT, RoBERTa, and ALBERT.
We find that for some probing tasks fine-tuning leads to substantial changes in accuracy.
While fine-tuning indeed changes the representations of a pre-trained model, only in very few cases, fine-tuning has a positive effect on probing accuracy.
arXiv Detail & Related papers (2020-10-06T10:54:00Z) - Visually Grounded Compound PCFGs [65.04669567781634]
Exploiting visual groundings for language understanding has recently been drawing much attention.
We study visually grounded grammar induction and learn a constituency from both unlabeled text and its visual captions.
arXiv Detail & Related papers (2020-09-25T19:07:00Z) - Predicting What You Already Know Helps: Provable Self-Supervised
Learning [60.27658820909876]
Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks) without requiring labeled data.
We show a mechanism exploiting the statistical connections between certain em reconstruction-based pretext tasks that guarantee to learn a good representation.
We prove the linear layer yields small approximation error even for complex ground truth function class.
arXiv Detail & Related papers (2020-08-03T17:56:13Z) - What Happens To BERT Embeddings During Fine-tuning? [19.016185902256826]
We investigate how fine-tuning affects the representations of the BERT model.
We find that fine-tuning primarily affects the top layers of BERT, but with noteworthy variation across tasks.
In particular, dependency parsing reconfigures most of the model, whereas SQuAD and MNLI appear to involve much shallower processing.
arXiv Detail & Related papers (2020-04-29T19:46:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.