Related papers: Identifying the Limits of Cross-Domain Knowledge Transfer for Pretrained Models

Identifying the Limits of Cross-Domain Knowledge Transfer for Pretrained Models

URL: http://arxiv.org/abs/2104.08410v1
Date: Sat, 17 Apr 2021 00:14:39 GMT
Title: Identifying the Limits of Cross-Domain Knowledge Transfer for Pretrained Models
Authors: Zhengxuan Wu, Nelson F. Liu, Christopher Potts
Abstract summary: We explore how much transfer occurs when models are denied any information about word identity via random scrambling. We find that only BERT shows high rates of transfer into our scrambled domains, and for classification but not sequence labeling tasks. Our analyses seek to explain why transfer succeeds for some tasks but not others, to isolate the separate contributions of pretraining versus fine-tuning, and to quantify the role of word frequency.
Score: 9.359514457957799
License: http://creativecommons.org/licenses/by/4.0/
Abstract: There is growing evidence that pretrained language models improve task-specific fine-tuning not just for the languages seen in pretraining, but also for new languages and even non-linguistic data. What is the nature of this surprising cross-domain transfer? We offer a partial answer via a systematic exploration of how much transfer occurs when models are denied any information about word identity via random scrambling. In four classification tasks and two sequence labeling tasks, we evaluate baseline models, LSTMs using GloVe embeddings, and BERT. We find that only BERT shows high rates of transfer into our scrambled domains, and for classification but not sequence labeling tasks. Our analyses seek to explain why transfer succeeds for some tasks but not others, to isolate the separate contributions of pretraining versus fine-tuning, and to quantify the role of word frequency. These findings help explain where and why cross-domain transfer occurs, which can guide future studies and practical fine-tuning efforts.

Related papers

Argument Mining in Data Scarce Settings: Cross-lingual Transfer and Few-shot Techniques [5.735035463793008]
We show that for Argument Mining, data transfer obtains better results than model-transfer. For few-shot, the type of task (length and complexity of the sequence spans) and sampling method prove to be crucial.
arXiv Detail & Related papers (2024-07-04T08:59:17Z)
Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study [68.75670223005716]
We find that pre-trained language models like BERT have a potential ability to learn sequentially, even without any sparse memory replay. Our experiments reveal that BERT can actually generate high quality representations for previously learned tasks in a long term, under extremely sparse replay or even no replay.
arXiv Detail & Related papers (2023-03-02T09:03:43Z)
Domain Mismatch Doesn't Always Prevent Cross-Lingual Transfer Learning [51.232774288403114]
Cross-lingual transfer learning has been surprisingly effective in zero-shot cross-lingual classification. We show that a simple regimen can overcome much of the effect of domain mismatch in cross-lingual transfer.
arXiv Detail & Related papers (2022-11-30T01:24:33Z)
Characterization of effects of transfer learning across domains and languages [0.0]
Transfer learning (TL) from pre-trained neural language models has emerged as a powerful technique over the years. We investigate how TL affects the performance of popular pre-trained models over three natural language processing (NLP) tasks.
arXiv Detail & Related papers (2022-10-03T17:17:07Z)
Task Transfer and Domain Adaptation for Zero-Shot Question Answering [18.188082154309175]
We use supervised pretraining on source-domain data to reduce sample complexity on domain-specific downstream tasks. We evaluate zero-shot performance on domain-specific reading comprehension tasks by combining task transfer with domain adaptation.
arXiv Detail & Related papers (2022-06-14T09:10:48Z)
FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue [70.65782786401257]
This work explores conversational task transfer by introducing FETA: a benchmark for few-sample task transfer in open-domain dialogue. FETA contains two underlying sets of conversations upon which there are 10 and 7 tasks annotated, enabling the study of intra-dataset task transfer. We utilize three popular language models and three learning algorithms to analyze the transferability between 132 source-target task pairs.
arXiv Detail & Related papers (2022-05-12T17:59:00Z)
Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages. Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z)
Sequential Reptile: Inter-Task Gradient Alignment for Multilingual Learning [61.29879000628815]
We show that it is crucial for tasks to align gradients between them in order to maximize knowledge transfer. We propose a simple yet effective method that can efficiently align gradients between tasks. We extensively validate our method on various multi-task learning and zero-shot cross-lingual transfer tasks.
arXiv Detail & Related papers (2021-10-06T09:10:10Z)
First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT [2.2931318723689276]
Cross-lingual transfer emerges from fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning. We show that multilingual BERT can be viewed as the stacking of two sub-networks: a multilingual encoder followed by a task-specific language-agnostic predictor. While the encoder is crucial for cross-lingual transfer and remains mostly unchanged during fine-tuning, the task predictor has little importance on the transfer and can be red during fine-tuning.
arXiv Detail & Related papers (2021-01-26T22:12:38Z)
Analyzing Zero-shot Cross-lingual Transfer in Supervised NLP Tasks [6.7155846430379285]
In zero-shot cross-lingual transfer, a supervised NLP task trained on a corpus in one language is directly applicable to another language without any additional training. Recently introduced cross-lingual language model (XLM) pretraining brings out neural parameter sharing in Transformer-style networks. In this paper, we aim to validate the hypothetically strong cross-lingual transfer properties induced by XLM pretraining.
arXiv Detail & Related papers (2021-01-26T09:21:25Z)
Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting. Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking. We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual Transfer with Multilingual Transformers [62.637055980148816]
Massively multilingual transformers pretrained with language modeling objectives have become a de facto default transfer paradigm for NLP. We show that cross-lingual transfer via massively multilingual transformers is substantially less effective in resource-lean scenarios and for distant languages.
arXiv Detail & Related papers (2020-05-01T22:04:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.