One For All & All For One: Bypassing Hyperparameter Tuning with Model
Averaging For Cross-Lingual Transfer
- URL: http://arxiv.org/abs/2310.10532v1
- Date: Mon, 16 Oct 2023 15:50:34 GMT
- Title: One For All & All For One: Bypassing Hyperparameter Tuning with Model
Averaging For Cross-Lingual Transfer
- Authors: Fabian David Schmidt, Ivan Vuli\'c, Goran Glava\v{s}
- Abstract summary: We propose an unsupervised evaluation protocol for ZS-XLT.
We run broad ZS-XLT experiments on both higher-level semantic tasks (NLI, extractive QA) and a lower-level token classification task (NER)
We find that conventional model selection based on source-language validation quickly plateaus to suboptimal ZS-XLT performance.
- Score: 61.455775535559276
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Multilingual language models enable zero-shot cross-lingual transfer
(ZS-XLT): fine-tuned on sizable source-language task data, they perform the
task in target languages without labeled instances. The effectiveness of ZS-XLT
hinges on the linguistic proximity between languages and the amount of
pretraining data for a language. Because of this, model selection based on
source-language validation is unreliable: it picks model snapshots with
suboptimal target-language performance. As a remedy, some work optimizes ZS-XLT
by extensively tuning hyperparameters: the follow-up work then routinely
struggles to replicate the original results. Other work searches over narrower
hyperparameter grids, reporting substantially lower performance. In this work,
we therefore propose an unsupervised evaluation protocol for ZS-XLT that
decouples performance maximization from hyperparameter tuning. As a robust and
more transparent alternative to extensive hyperparameter tuning, we propose to
accumulatively average snapshots from different runs into a single model. We
run broad ZS-XLT experiments on both higher-level semantic tasks (NLI,
extractive QA) and a lower-level token classification task (NER) and find that
conventional model selection based on source-language validation quickly
plateaus to suboptimal ZS-XLT performance. On the other hand, our accumulative
run-by-run averaging of models trained with different hyperparameters boosts
ZS-XLT performance and closely correlates with "oracle" ZS-XLT, i.e., model
selection based on target-language validation performance.
Related papers
- ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets [106.7760874400261]
This paper presents ML-SUPERB2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models.
We find performance improvements over the setup of ML-SUPERB, but performance depends on the downstream model design.
Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches.
arXiv Detail & Related papers (2024-06-12T21:01:26Z) - On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based
Multilingual Model [49.81429697921861]
We study the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models.
We show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning.
arXiv Detail & Related papers (2023-11-14T00:43:33Z) - X-SNS: Cross-Lingual Transfer Prediction through Sub-Network Similarity [19.15213046428148]
Cross-lingual transfer (XLT) is an ability of multilingual language models that preserves their performance on a task to a significant extent when evaluated in languages that were not included in the fine-tuning process.
We propose the utilization of sub-network similarity between two languages as a proxy for predicting the compatibility of the languages in the context of XLT.
arXiv Detail & Related papers (2023-10-26T05:39:49Z) - Free Lunch: Robust Cross-Lingual Transfer via Model Checkpoint Averaging [60.79382212029304]
Massively multilingual language models have displayed strong performance in zero-shot (ZS-XLT) and few-shot (FS-XLT) cross-lingual transfer setups.
We propose a simple and effective method that averages different checkpoints (i.e., model snapshots) during task fine-tuning.
arXiv Detail & Related papers (2023-05-26T11:24:32Z) - Not All Languages Are Created Equal in LLMs: Improving Multilingual
Capability by Cross-Lingual-Thought Prompting [123.16452714740106]
Large language models (LLMs) demonstrate impressive multilingual capability, but their performance varies substantially across different languages.
We introduce a simple yet effective method, called cross-lingual-thought prompting (XLT)
XLT is a generic template prompt that stimulates cross-lingual and logical reasoning skills to enhance task performance across languages.
arXiv Detail & Related papers (2023-05-11T17:44:17Z) - Prompt-Tuning Can Be Much Better Than Fine-Tuning on Cross-lingual
Understanding With Multilingual Language Models [95.32691891392903]
In this paper, we do cross-lingual evaluation on various NLU tasks using prompt-tuning and compare it with fine-tuning.
The results show that prompt tuning achieves much better cross-lingual transfer than fine-tuning across datasets.
arXiv Detail & Related papers (2022-10-22T05:48:02Z) - XeroAlign: Zero-Shot Cross-lingual Transformer Alignment [9.340611077939828]
We introduce a method for task-specific alignment of cross-lingual pretrained transformers such as XLM-R.
XeroAlign uses translated task data to encourage the model to generate similar sentence embeddings for different languages.
XLM-RA's text classification accuracy exceeds that of XLM-R trained with labelled data and performs on par with state-of-the-art models on a cross-lingual adversarial paraphrasing task.
arXiv Detail & Related papers (2021-05-06T07:10:00Z) - WARP: Word-level Adversarial ReProgramming [13.08689221166729]
In many applications it is preferable to tune much smaller sets of parameters, so that the majority of parameters can be shared across multiple tasks.
We present an alternative approach based on adversarial reprogramming, which extends earlier work on automatic prompt generation.
We show that this approach outperforms other methods with a similar number of trainable parameters on SST-2 and MNLI datasets.
arXiv Detail & Related papers (2021-01-01T00:41:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.