Unleashing the Multilingual Encoder Potential: Boosting Zero-Shot
Performance via Probability Calibration
- URL: http://arxiv.org/abs/2310.05069v2
- Date: Thu, 19 Oct 2023 15:58:05 GMT
- Title: Unleashing the Multilingual Encoder Potential: Boosting Zero-Shot
Performance via Probability Calibration
- Authors: Ercong Nie, Helmut Schmid, Hinrich Sch\"utze
- Abstract summary: Pretrained multilingual encoder models can directly perform zero-shot multilingual tasks or linguistic probing by reformulating the input examples into cloze-style prompts.
This method is limited by the model's bias toward predicting label words which frequently occurred during the pretraining.
We combine the models with calibration techniques which modify the probabilities of label words predicted by the models.
- Score: 12.424785560515094
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pretrained multilingual encoder models can directly perform zero-shot
multilingual tasks or linguistic probing by reformulating the input examples
into cloze-style prompts. This is accomplished by predicting the probabilities
of the label words at the masked token position, without requiring any updates
to the model parameters. However, the performance of this method is limited by
the model's bias toward predicting label words which frequently occurred during
the pretraining. These words typically receive high probabilities. To address
this issue, we combine the models with calibration techniques which modify the
probabilities of label words predicted by the models. We first validate the
effectiveness of a proposed simple calibration method together with other
existing techniques on monolingual encoders in both zero- and few-shot
scenarios. We subsequently employ these calibration techniques on multilingual
encoders, resulting in substantial performance improvements across a wide range
of tasks.
Related papers
- Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition [5.575078692353885]
We propose a new model for multi-token prediction in transformers, aiming to enhance sampling efficiency without compromising accuracy.
By generalizing it to a rank-$r$ canonical probability decomposition, we develop an improved model that predicts multiple tokens simultaneously.
arXiv Detail & Related papers (2024-10-23T11:06:36Z) - Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method [108.56493934296687]
We introduce a divergence-based calibration method, inspired by the divergence-from-randomness concept, to calibrate token probabilities for pretraining data detection.
We have developed a Chinese-language benchmark, PatentMIA, to assess the performance of detection approaches for LLMs on Chinese text.
arXiv Detail & Related papers (2024-09-23T07:55:35Z) - IDoFew: Intermediate Training Using Dual-Clustering in Language Models
for Few Labels Text Classification [24.11420537250414]
Bidirectional Representations from Transformers (BERT) have been very effective in various Natural Language Processing (NLP) and text mining tasks including text classification.
Some tasks still pose challenges for these models, including text classification with limited labels.
We have developed a novel two-stage intermediate clustering with subsequent fine-tuning that models the pseudo-labels reliably.
arXiv Detail & Related papers (2024-01-08T17:07:37Z) - On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based
Multilingual Model [49.81429697921861]
We study the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models.
We show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning.
arXiv Detail & Related papers (2023-11-14T00:43:33Z) - Unsupervised Calibration through Prior Adaptation for Text
Classification using Large Language Models [37.39843935632105]
We propose an approach to adapt the prior class distribution to perform text classification tasks without the need for labelled samples.
Results show that these methods outperform the un-adapted model for different number of training shots in the prompt.
arXiv Detail & Related papers (2023-07-13T12:11:36Z) - Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models [48.77653835765705]
We introduce a probabilistic resolution to prompt tuning, where the label-specific prompts are generated hierarchically by first sampling a latent vector from an underlying distribution and then employing a lightweight generative model.
We evaluate the effectiveness of our approach on four tasks: few-shot image recognition, base-to-new generalization, dataset transfer learning, and domain shifts.
arXiv Detail & Related papers (2023-03-16T06:09:15Z) - A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained
Models [87.7086269902562]
We show that subword-based models might still be the most practical choice in many settings.
We encourage future work in tokenizer-free methods to consider these factors when designing and evaluating new models.
arXiv Detail & Related papers (2022-10-13T15:47:09Z) - Consistency Regularization for Cross-Lingual Fine-Tuning [61.08704789561351]
We propose to improve cross-lingual fine-tuning with consistency regularization.
Specifically, we use example consistency regularization to penalize the prediction sensitivity to four types of data augmentations.
Experimental results on the XTREME benchmark show that our method significantly improves cross-lingual fine-tuning across various tasks.
arXiv Detail & Related papers (2021-06-15T15:35:44Z) - Are Some Words Worth More than Others? [3.5598388686985354]
We propose two new intrinsic evaluation measures within the framework of a simple word prediction task.
We evaluate several commonly-used large English language models using our proposed metrics.
arXiv Detail & Related papers (2020-10-12T23:12:11Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.