Free Lunch: Robust Cross-Lingual Transfer via Model Checkpoint Averaging
- URL: http://arxiv.org/abs/2305.16834v1
- Date: Fri, 26 May 2023 11:24:32 GMT
- Title: Free Lunch: Robust Cross-Lingual Transfer via Model Checkpoint Averaging
- Authors: Fabian David Schmidt, Ivan Vuli\'c, Goran Glava\v{s}
- Abstract summary: Massively multilingual language models have displayed strong performance in zero-shot (ZS-XLT) and few-shot (FS-XLT) cross-lingual transfer setups.
We propose a simple and effective method that averages different checkpoints (i.e., model snapshots) during task fine-tuning.
- Score: 60.79382212029304
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Massively multilingual language models have displayed strong performance in
zero-shot (ZS-XLT) and few-shot (FS-XLT) cross-lingual transfer setups, where
models fine-tuned on task data in a source language are transferred without any
or with only a few annotated instances to the target language(s). However,
current work typically overestimates model performance as fine-tuned models are
frequently evaluated at model checkpoints that generalize best to validation
instances in the target languages. This effectively violates the main
assumptions of "true" ZS-XLT and FS-XLT. Such XLT setups require robust methods
that do not depend on labeled target language data for validation and model
selection. In this work, aiming to improve the robustness of "true" ZS-XLT and
FS-XLT, we propose a simple and effective method that averages different
checkpoints (i.e., model snapshots) during task fine-tuning. We conduct
exhaustive ZS-XLT and FS-XLT experiments across higher-level semantic tasks
(NLI, extractive QA) and lower-level token classification tasks (NER, POS). The
results indicate that averaging model checkpoints yields systematic and
consistent performance gains across diverse target languages in all tasks.
Importantly, it simultaneously substantially desensitizes XLT to varying
hyperparameter choices in the absence of target language validation. We also
show that checkpoint averaging benefits performance when further combined with
run averaging (i.e., averaging the parameters of models fine-tuned over
independent runs).
Related papers
- ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets [106.7760874400261]
This paper presents ML-SUPERB2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models.
We find performance improvements over the setup of ML-SUPERB, but performance depends on the downstream model design.
Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches.
arXiv Detail & Related papers (2024-06-12T21:01:26Z) - Sign of the Times: Evaluating the use of Large Language Models for Idiomaticity Detection [2.2724928083094196]
This work looks at the performance of a range of LLMs on three idiomaticity datasets: SemEval 2022 Task 2a, FLUTE, and MAGPIE.
We find that whilst these models do give competitive performance, they do not match the results of fine-tuned task-specific models, even at the largest scales.
arXiv Detail & Related papers (2024-05-15T11:55:14Z) - On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based
Multilingual Model [49.81429697921861]
We study the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models.
We show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning.
arXiv Detail & Related papers (2023-11-14T00:43:33Z) - X-SNS: Cross-Lingual Transfer Prediction through Sub-Network Similarity [19.15213046428148]
Cross-lingual transfer (XLT) is an ability of multilingual language models that preserves their performance on a task to a significant extent when evaluated in languages that were not included in the fine-tuning process.
We propose the utilization of sub-network similarity between two languages as a proxy for predicting the compatibility of the languages in the context of XLT.
arXiv Detail & Related papers (2023-10-26T05:39:49Z) - One For All & All For One: Bypassing Hyperparameter Tuning with Model
Averaging For Cross-Lingual Transfer [61.455775535559276]
We propose an unsupervised evaluation protocol for ZS-XLT.
We run broad ZS-XLT experiments on both higher-level semantic tasks (NLI, extractive QA) and a lower-level token classification task (NER)
We find that conventional model selection based on source-language validation quickly plateaus to suboptimal ZS-XLT performance.
arXiv Detail & Related papers (2023-10-16T15:50:34Z) - Prompt-Tuning Can Be Much Better Than Fine-Tuning on Cross-lingual
Understanding With Multilingual Language Models [95.32691891392903]
In this paper, we do cross-lingual evaluation on various NLU tasks using prompt-tuning and compare it with fine-tuning.
The results show that prompt tuning achieves much better cross-lingual transfer than fine-tuning across datasets.
arXiv Detail & Related papers (2022-10-22T05:48:02Z) - X2Parser: Cross-Lingual and Cross-Domain Framework for Task-Oriented
Compositional Semantic Parsing [51.81533991497547]
Task-oriented compositional semantic parsing (TCSP) handles complex nested user queries.
We present X2 compared a transferable Cross-lingual and Cross-domain for TCSP.
We propose to predict flattened intents and slots representations separately and cast both prediction tasks into sequence labeling problems.
arXiv Detail & Related papers (2021-06-07T16:40:05Z) - XeroAlign: Zero-Shot Cross-lingual Transformer Alignment [9.340611077939828]
We introduce a method for task-specific alignment of cross-lingual pretrained transformers such as XLM-R.
XeroAlign uses translated task data to encourage the model to generate similar sentence embeddings for different languages.
XLM-RA's text classification accuracy exceeds that of XLM-R trained with labelled data and performs on par with state-of-the-art models on a cross-lingual adversarial paraphrasing task.
arXiv Detail & Related papers (2021-05-06T07:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.