Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning
- URL: http://arxiv.org/abs/2004.14218v2
- Date: Sun, 4 Oct 2020 08:43:24 GMT
- Title: Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning
- Authors: Zihan Liu, Genta Indra Winata, Andrea Madotto, Pascale Fung
- Abstract summary: Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
- Score: 74.25168207651376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, fine-tuning pre-trained language models (e.g., multilingual BERT)
to downstream cross-lingual tasks has shown promising results. However, the
fine-tuning process inevitably changes the parameters of the pre-trained model
and weakens its cross-lingual ability, which leads to sub-optimal performance.
To alleviate this problem, we leverage continual learning to preserve the
original cross-lingual ability of the pre-trained model when we fine-tune it to
downstream tasks. The experimental result shows that our fine-tuning methods
can better preserve the cross-lingual ability of the pre-trained model in a
sentence retrieval task. Our methods also achieve better performance than other
fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and
named entity recognition tasks.
Related papers
- Distilling Monolingual and Crosslingual Word-in-Context Representations [18.87665111304974]
We propose a method that distils representations of word meaning in context from a pre-trained language model in both monolingual and crosslingual settings.
Our method does not require human-annotated corpora nor updates of the parameters of the pre-trained model.
Our method learns to combine the outputs of different hidden layers of the pre-trained model using self-attention.
arXiv Detail & Related papers (2024-09-13T11:10:16Z) - Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin [3.2039731457723604]
We aim to improve upon both text classification and translation of Nigerian Pidgin (Naija) by collecting a large-scale parallel English-Pidgin corpus.
Our studies show that English pre-trained language models serve as a stronger prior than multilingual language models on English-Pidgin tasks with up to 2.38 BLEU improvements.
arXiv Detail & Related papers (2023-07-01T16:47:36Z) - Pre-Trained Language-Meaning Models for Multilingual Parsing and
Generation [14.309869321407522]
We introduce multilingual pre-trained language-meaning models based on Discourse Representation Structures (DRSs)
Since DRSs are language neutral, cross-lingual transfer learning is adopted to further improve the performance of non-English tasks.
automatic evaluation results show that our approach achieves the best performance on both the multilingual DRS parsing and DRS-to-text generation tasks.
arXiv Detail & Related papers (2023-05-31T19:00:33Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - MSP: Multi-Stage Prompting for Making Pre-trained Language Models Better
Translators [10.557167523009392]
We present Multi-Stage Prompting, a simple and lightweight approach for better adapting pre-trained language models to translation tasks.
To make pre-trained language models better translators, we divide the translation process via pre-trained language models into three separate stages.
During each stage, we independently apply different continuous prompts for allowing pre-trained language models better adapting to translation tasks.
arXiv Detail & Related papers (2021-10-13T10:06:21Z) - Improving Cross-Lingual Reading Comprehension with Self-Training [62.73937175625953]
Current state-of-the-art models even surpass human performance on several benchmarks.
Previous works have revealed the abilities of pre-trained multilingual models for zero-shot cross-lingual reading comprehension.
This paper further utilized unlabeled data to improve the performance.
arXiv Detail & Related papers (2021-05-08T08:04:30Z) - Pre-Training a Language Model Without Human Language [74.11825654535895]
We study how the intrinsic nature of pre-training data contributes to the fine-tuned downstream performance.
We find that models pre-trained on unstructured data beat those trained directly from scratch on downstream tasks.
To our great astonishment, we uncover that pre-training on certain non-human language data gives GLUE performance close to performance pre-trained on another non-English language.
arXiv Detail & Related papers (2020-12-22T13:38:06Z) - Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language
Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements.
We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations.
Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z) - Unsupervised Cross-lingual Adaptation for Sequence Tagging and Beyond [58.80417796087894]
Cross-lingual adaptation with multilingual pre-trained language models (mPTLMs) mainly consists of two lines of works: zero-shot approach and translation-based approach.
We propose a novel framework to consolidate the zero-shot approach and the translation-based approach for better adaptation performance.
arXiv Detail & Related papers (2020-10-23T13:47:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.