Por Qu\'e N\~ao Utiliser Alla Spr{\aa}k? Mixed Training with Gradient
Optimization in Few-Shot Cross-Lingual Transfer
- URL: http://arxiv.org/abs/2204.13869v1
- Date: Fri, 29 Apr 2022 04:05:02 GMT
- Title: Por Qu\'e N\~ao Utiliser Alla Spr{\aa}k? Mixed Training with Gradient
Optimization in Few-Shot Cross-Lingual Transfer
- Authors: Haoran Xu, Kenton Murray
- Abstract summary: We propose a one-step mixed training method that trains on both source and target data.
We use one model to handle all target languages simultaneously to avoid excessively language-specific models.
Our proposed method achieves state-of-the-art performance on all tasks and outperforms target-adapting by a large margin.
- Score: 2.7213511121305465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The current state-of-the-art for few-shot cross-lingual transfer learning
first trains on abundant labeled data in the source language and then
fine-tunes with a few examples on the target language, termed target-adapting.
Though this has been demonstrated to work on a variety of tasks, in this paper
we show some deficiencies of this approach and propose a one-step mixed
training method that trains on both source and target data with
\textit{stochastic gradient surgery}, a novel gradient-level optimization.
Unlike the previous studies that focus on one language at a time when
target-adapting, we use one model to handle all target languages simultaneously
to avoid excessively language-specific models. Moreover, we discuss the
unreality of utilizing large target development sets for model selection in
previous literature. We further show that our method is both development-free
for target languages, and is also able to escape from overfitting issues. We
conduct a large-scale experiment on 4 diverse NLP tasks across up to 48
languages. Our proposed method achieves state-of-the-art performance on all
tasks and outperforms target-adapting by a large margin, especially for
languages that are linguistically distant from the source language, e.g., 7.36%
F1 absolute gain on average for the NER task, up to 17.60% on Punjabi.
Related papers
- Zero-shot Cross-lingual Transfer Learning with Multiple Source and Target Languages for Information Extraction: Language Selection and Adversarial Training [38.19963761398705]
This paper provides a detailed analysis on Cross-Lingual Multi-Transferability (many-to-many transfer learning) for the recent IE corpora.
We first determine the correlation between single-language performance and a wide range of linguistic-based distances.
Next, we investigate the more general zero-shot multi-lingual transfer settings where multiple languages are involved in the training and evaluation processes.
arXiv Detail & Related papers (2024-11-13T17:13:25Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks.
Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages.
In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap.
Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z) - CrossAligner & Co: Zero-Shot Transfer Methods for Task-Oriented
Cross-lingual Natural Language Understanding [18.14437842819122]
CrossAligner is the principal method of a variety of effective approaches for zero-shot cross-lingual transfer.
We present a quantitative analysis of individual methods as well as their weighted combinations, several of which exceed state-of-the-art (SOTA) scores.
A detailed qualitative error analysis of the best methods shows that our fine-tuned language models can zero-shot transfer the task knowledge better than anticipated.
arXiv Detail & Related papers (2022-03-18T14:18:12Z) - From Good to Best: Two-Stage Training for Cross-lingual Machine Reading
Comprehension [51.953428342923885]
We develop a two-stage approach to enhance the model performance.
The first stage targets at recall: we design a hard-learning (HL) algorithm to maximize the likelihood that the top-k predictions contain the accurate answer.
The second stage focuses on precision: an answer-aware contrastive learning mechanism is developed to learn the fine difference between the accurate answer and other candidates.
arXiv Detail & Related papers (2021-12-09T07:31:15Z) - Few-Shot Cross-Lingual Stance Detection with Sentiment-Based
Pre-Training [32.800766653254634]
We present the most comprehensive study of cross-lingual stance detection to date.
We use 15 diverse datasets in 12 languages from 6 language families.
For our experiments, we build on pattern-exploiting training, proposing the addition of a novel label encoder.
arXiv Detail & Related papers (2021-09-13T15:20:06Z) - Nearest Neighbour Few-Shot Learning for Cross-lingual Classification [2.578242050187029]
Cross-lingual adaptation using a simple nearest neighbor few-shot (15 samples) inference technique for classification tasks.
Our approach consistently improves traditional fine-tuning using only a handful of labeled samples in target locales.
arXiv Detail & Related papers (2021-09-06T03:18:23Z) - Multilingual Code-Switching for Zero-Shot Cross-Lingual Intent
Prediction and Slot Filling [29.17194639368877]
We propose a novel method to augment the monolingual source data using multilingual code-switching via random translations.
Experiments on the benchmark dataset of MultiATIS++ yielded an average improvement of +4.2% in accuracy for intent task and +1.8% in F1 for slot task.
We present an application of our method for crisis informatics using a new human-annotated tweet dataset of slot filling in English and Haitian Creole.
arXiv Detail & Related papers (2021-03-13T21:05:09Z) - Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language
Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements.
We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations.
Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.