Identifying the Limits of Cross-Domain Knowledge Transfer for Pretrained
Models
- URL: http://arxiv.org/abs/2104.08410v1
- Date: Sat, 17 Apr 2021 00:14:39 GMT
- Title: Identifying the Limits of Cross-Domain Knowledge Transfer for Pretrained
Models
- Authors: Zhengxuan Wu, Nelson F. Liu, Christopher Potts
- Abstract summary: We explore how much transfer occurs when models are denied any information about word identity via random scrambling.
We find that only BERT shows high rates of transfer into our scrambled domains, and for classification but not sequence labeling tasks.
Our analyses seek to explain why transfer succeeds for some tasks but not others, to isolate the separate contributions of pretraining versus fine-tuning, and to quantify the role of word frequency.
- Score: 9.359514457957799
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is growing evidence that pretrained language models improve
task-specific fine-tuning not just for the languages seen in pretraining, but
also for new languages and even non-linguistic data. What is the nature of this
surprising cross-domain transfer? We offer a partial answer via a systematic
exploration of how much transfer occurs when models are denied any information
about word identity via random scrambling. In four classification tasks and two
sequence labeling tasks, we evaluate baseline models, LSTMs using GloVe
embeddings, and BERT. We find that only BERT shows high rates of transfer into
our scrambled domains, and for classification but not sequence labeling tasks.
Our analyses seek to explain why transfer succeeds for some tasks but not
others, to isolate the separate contributions of pretraining versus
fine-tuning, and to quantify the role of word frequency. These findings help
explain where and why cross-domain transfer occurs, which can guide future
studies and practical fine-tuning efforts.
Related papers
- Argument Mining in Data Scarce Settings: Cross-lingual Transfer and Few-shot Techniques [5.735035463793008]
We show that for Argument Mining, data transfer obtains better results than model-transfer.
For few-shot, the type of task (length and complexity of the sequence spans) and sampling method prove to be crucial.
arXiv Detail & Related papers (2024-07-04T08:59:17Z) - Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study [68.75670223005716]
We find that pre-trained language models like BERT have a potential ability to learn sequentially, even without any sparse memory replay.
Our experiments reveal that BERT can actually generate high quality representations for previously learned tasks in a long term, under extremely sparse replay or even no replay.
arXiv Detail & Related papers (2023-03-02T09:03:43Z) - Domain Mismatch Doesn't Always Prevent Cross-Lingual Transfer Learning [51.232774288403114]
Cross-lingual transfer learning has been surprisingly effective in zero-shot cross-lingual classification.
We show that a simple regimen can overcome much of the effect of domain mismatch in cross-lingual transfer.
arXiv Detail & Related papers (2022-11-30T01:24:33Z) - Characterization of effects of transfer learning across domains and
languages [0.0]
Transfer learning (TL) from pre-trained neural language models has emerged as a powerful technique over the years.
We investigate how TL affects the performance of popular pre-trained models over three natural language processing (NLP) tasks.
arXiv Detail & Related papers (2022-10-03T17:17:07Z) - Task Transfer and Domain Adaptation for Zero-Shot Question Answering [18.188082154309175]
We use supervised pretraining on source-domain data to reduce sample complexity on domain-specific downstream tasks.
We evaluate zero-shot performance on domain-specific reading comprehension tasks by combining task transfer with domain adaptation.
arXiv Detail & Related papers (2022-06-14T09:10:48Z) - FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue [70.65782786401257]
This work explores conversational task transfer by introducing FETA: a benchmark for few-sample task transfer in open-domain dialogue.
FETA contains two underlying sets of conversations upon which there are 10 and 7 tasks annotated, enabling the study of intra-dataset task transfer.
We utilize three popular language models and three learning algorithms to analyze the transferability between 132 source-target task pairs.
arXiv Detail & Related papers (2022-05-12T17:59:00Z) - Sequential Reptile: Inter-Task Gradient Alignment for Multilingual
Learning [61.29879000628815]
We show that it is crucial for tasks to align gradients between them in order to maximize knowledge transfer.
We propose a simple yet effective method that can efficiently align gradients between tasks.
We extensively validate our method on various multi-task learning and zero-shot cross-lingual transfer tasks.
arXiv Detail & Related papers (2021-10-06T09:10:10Z) - First Align, then Predict: Understanding the Cross-Lingual Ability of
Multilingual BERT [2.2931318723689276]
Cross-lingual transfer emerges from fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning.
We show that multilingual BERT can be viewed as the stacking of two sub-networks: a multilingual encoder followed by a task-specific language-agnostic predictor.
While the encoder is crucial for cross-lingual transfer and remains mostly unchanged during fine-tuning, the task predictor has little importance on the transfer and can be red during fine-tuning.
arXiv Detail & Related papers (2021-01-26T22:12:38Z) - Analyzing Zero-shot Cross-lingual Transfer in Supervised NLP Tasks [6.7155846430379285]
In zero-shot cross-lingual transfer, a supervised NLP task trained on a corpus in one language is directly applicable to another language without any additional training.
Recently introduced cross-lingual language model (XLM) pretraining brings out neural parameter sharing in Transformer-style networks.
In this paper, we aim to validate the hypothetically strong cross-lingual transfer properties induced by XLM pretraining.
arXiv Detail & Related papers (2021-01-26T09:21:25Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual
Transfer with Multilingual Transformers [62.637055980148816]
Massively multilingual transformers pretrained with language modeling objectives have become a de facto default transfer paradigm for NLP.
We show that cross-lingual transfer via massively multilingual transformers is substantially less effective in resource-lean scenarios and for distant languages.
arXiv Detail & Related papers (2020-05-01T22:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.