FUN with Fisher: Improving Generalization of Adapter-Based Cross-lingual Transfer with Scheduled Unfreezing
- URL: http://arxiv.org/abs/2301.05487v2
- Date: Thu, 4 Apr 2024 07:43:16 GMT
- Title: FUN with Fisher: Improving Generalization of Adapter-Based Cross-lingual Transfer with Scheduled Unfreezing
- Authors: Chen Cecilia Liu, Jonas Pfeiffer, Ivan Vulić, Iryna Gurevych,
- Abstract summary: We investigate scheduled unfreezing algorithms for fine-tuning task adapters.
Experiments show scheduled unfreezing methods close the gap to full fine-tuning and achieve stronger cross-lingual transfer performance.
We propose a general scheduled unfreezing algorithm that achieves an average of 2 points improvement over four datasets.
- Score: 60.629222280633606
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Standard fine-tuning of language models typically performs well on in-distribution data, but suffers with generalization to distribution shifts. In this work, we aim to improve the generalization of adapter-based cross-lingual task transfer where such cross-language distribution shifts are imminent. We investigate scheduled unfreezing algorithms -- originally proposed to mitigate catastrophic forgetting in transfer learning -- for fine-tuning task adapters. Our experiments show that scheduled unfreezing methods close the gap to full fine-tuning and achieve stronger cross-lingual transfer performance, suggesting that these methods can go beyond just mitigating catastrophic forgetting. Next, aiming to understand these empirical findings, we investigate the learning dynamics of scheduled unfreezing using Fisher Information. Our experiments reveal that scheduled unfreezing induces different learning dynamics compared to standard fine-tuning, and provide evidence that the dynamics of Fisher Information during training correlate with cross-lingual generalization performance. We additionally propose a general scheduled unfreezing algorithm that achieves an average of 2 points improvement over four datasets compared to standard fine-tuning and provides empirical evidence for a theory-based justification of the heuristic unfreezing schedule for adapter training.
Related papers
- Information Guided Regularization for Fine-tuning Language Models [11.831883526217942]
We argue that a more surgical approach to regularization needs to exist for smoother transfer learning.
We devise a novel approach to dropout for improved model regularization and better downstream generalization.
arXiv Detail & Related papers (2024-06-20T05:18:37Z) - Headless Language Models: Learning without Predicting with Contrastive
Weight Tying [0.11510009152620666]
Self-supervised pre-training of language models usually consists in predicting probability distributions over extensive token vocabularies.
We propose an innovative method that shifts away from probability prediction and instead focuses on reconstructing input embeddings in a contrastive fashion via Constrastive Weight Tying (CWT)
We observe a significant +1.6 GLUE score increase and a notable +2.7 LAMBADA accuracy improvement compared to classical LMs within similar compute budgets.
arXiv Detail & Related papers (2023-09-15T12:20:00Z) - Unsupervised Unlearning of Concept Drift with Autoencoders [5.41354952642957]
Concept drift refers to a change in the data distribution affecting the data stream of future samples.
This paper proposes an unsupervised and model-agnostic concept drift adaptation method at the global level.
arXiv Detail & Related papers (2022-11-23T14:52:49Z) - Anticipating the Unseen Discrepancy for Vision and Language Navigation [63.399180481818405]
Vision-Language Navigation requires the agent to follow natural language instructions to reach a specific target.
The large discrepancy between seen and unseen environments makes it challenging for the agent to generalize well.
We propose Unseen Discrepancy Anticipating Vision and Language Navigation (DAVIS) that learns to generalize to unseen environments via encouraging test-time visual consistency.
arXiv Detail & Related papers (2022-09-10T19:04:40Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Consistency Regularization for Cross-Lingual Fine-Tuning [61.08704789561351]
We propose to improve cross-lingual fine-tuning with consistency regularization.
Specifically, we use example consistency regularization to penalize the prediction sensitivity to four types of data augmentations.
Experimental results on the XTREME benchmark show that our method significantly improves cross-lingual fine-tuning across various tasks.
arXiv Detail & Related papers (2021-06-15T15:35:44Z) - Unsupervised neural adaptation model based on optimal transport for
spoken language identification [54.96267179988487]
Due to the mismatch of statistical distributions of acoustic speech between training and testing sets, the performance of spoken language identification (SLID) could be drastically degraded.
We propose an unsupervised neural adaptation model to deal with the distribution mismatch problem for SLID.
arXiv Detail & Related papers (2020-12-24T07:37:19Z) - Unsupervised Cross-lingual Adaptation for Sequence Tagging and Beyond [58.80417796087894]
Cross-lingual adaptation with multilingual pre-trained language models (mPTLMs) mainly consists of two lines of works: zero-shot approach and translation-based approach.
We propose a novel framework to consolidate the zero-shot approach and the translation-based approach for better adaptation performance.
arXiv Detail & Related papers (2020-10-23T13:47:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.