Representation Projection Invariance Mitigates Representation Collapse
- URL: http://arxiv.org/abs/2205.11603v3
- Date: Tue, 21 Nov 2023 22:23:43 GMT
- Title: Representation Projection Invariance Mitigates Representation Collapse
- Authors: Anastasia Razdaibiedina, Ashish Khetan, Zohar Karnin, Daniel Khashabi,
Vishaal Kapoor, Vivek Madan
- Abstract summary: Fine-tuning contextualized representations learned by pre-trained language models can lead to representation collapse.
We propose Representation Projection Invariance (REPINA), a novel regularization method to maintain the information content of representation.
Our empirical findings show that REPINA is significantly more effective at mitigating representation collapse.
- Score: 27.485606184566386
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-tuning contextualized representations learned by pre-trained language
models remains a prevalent practice in NLP. However, fine-tuning can lead to
representation degradation (also known as representation collapse), which may
result in instability, sub-optimal performance, and weak generalization.
In this paper, we propose Representation Projection Invariance (REPINA), a
novel regularization method to maintain the information content of
representation and reduce representation collapse during fine-tuning by
discouraging undesirable changes in the representations. We study the empirical
behavior of the proposed regularization in comparison to 5 comparable baselines
across 13 language understanding tasks (GLUE benchmark and six additional
datasets). When evaluating in-domain performance, REPINA consistently
outperforms other baselines on most tasks (10 out of 13). We also demonstrate
its effectiveness in few-shot settings and robustness to label perturbation. As
a by-product, we extend previous studies of representation collapse and propose
several metrics to quantify it. Our empirical findings show that our approach
is significantly more effective at mitigating representation collapse.
Related papers
- A Mutually Reinforced Framework for Pretrained Sentence Embeddings [49.297766436632685]
InfoCSE is a novel framework for learning high-quality sentence embeddings.
It exploits the sentence representation model itself and realizes the following iterative self-supervision process.
In other words, the representation learning and data annotation become mutually reinforced, where a strong self-supervision effect can be derived.
arXiv Detail & Related papers (2022-02-28T14:00:16Z) - Regularizing Variational Autoencoder with Diversity and Uncertainty
Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference.
We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z) - Probing as Quantifying the Inductive Bias of Pre-trained Representations [99.93552997506438]
We present a novel framework for probing where the goal is to evaluate the inductive bias of representations for a particular task.
We apply our framework to a series of token-, arc-, and sentence-level tasks.
arXiv Detail & Related papers (2021-10-15T22:01:16Z) - Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning [57.4036085386653]
We show that prompt-based models for sentence pair classification tasks still suffer from a common pitfall of adopting inferences based on lexical overlap.
We then show that adding a regularization that preserves pretraining weights is effective in mitigating this destructive tendency of few-shot finetuning.
arXiv Detail & Related papers (2021-09-09T10:10:29Z) - Disentangled Contrastive Learning for Learning Robust Textual
Representations [13.880693856907037]
We introduce the concept of momentum representation consistency to align features and leverage power normalization while conforming the uniformity.
Our experimental results for the NLP benchmarks demonstrate that our approach can obtain better results compared with the baselines.
arXiv Detail & Related papers (2021-04-11T03:32:49Z) - Better Fine-Tuning by Reducing Representational Collapse [77.44854918334232]
Existing approaches for fine-tuning pre-trained language models have been shown to be unstable.
We present a method rooted in trust region theory that replaces previously used adversarial objectives with parametric noise.
We show it is less prone to representation collapse; the pre-trained models maintain more generalizable representations every time they are fine-tuned.
arXiv Detail & Related papers (2020-08-06T02:13:16Z) - Fairness by Learning Orthogonal Disentangled Representations [50.82638766862974]
We propose a novel disentanglement approach to invariant representation problem.
We enforce the meaningful representation to be agnostic to sensitive information by entropy.
The proposed approach is evaluated on five publicly available datasets.
arXiv Detail & Related papers (2020-03-12T11:09:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.