Related papers: ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks

ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks

URL: http://arxiv.org/abs/2401.16589v2
Date: Wed, 13 Mar 2024 09:45:02 GMT
Title: ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks
Authors: Bolei Ma, Ercong Nie, Shuzhou Yuan, Helmut Schmid, Michael F\"arber, Frauke Kreuter and Hinrich Sch\"utze
Abstract summary: ToPro method decomposes an input sentence into single tokens and applies one prompt template to each token. Our experiments on multilingual NER and POS tagging datasets demonstrate that ToPro-based fine-tuning outperforms Vanilla fine-tuning and Prompt-Tuning in zero-shot cross-lingual transfer.
Score: 12.700783525558721
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prompt-based methods have been successfully applied to multilingual pretrained language models for zero-shot cross-lingual understanding. However, most previous studies primarily focused on sentence-level classification tasks, and only a few considered token-level labeling tasks such as Named Entity Recognition (NER) and Part-of-Speech (POS) tagging. In this paper, we propose Token-Level Prompt Decomposition (ToPro), which facilitates the prompt-based method for token-level sequence labeling tasks. The ToPro method decomposes an input sentence into single tokens and applies one prompt template to each token. Our experiments on multilingual NER and POS tagging datasets demonstrate that ToPro-based fine-tuning outperforms Vanilla fine-tuning and Prompt-Tuning in zero-shot cross-lingual transfer, especially for languages that are typologically different from the source language English. Our method also attains state-of-the-art performance when employed with the mT5 model. Besides, our exploratory study in multilingual large language models shows that ToPro performs much better than the current in-context learning method. Overall, the performance improvements show that ToPro could potentially serve as a novel and simple benchmarking method for sequence labeling tasks.

Related papers

Sampling from Your Language Model One Byte at a Time [82.71473348639489]
Tokenization can introduce distortion into the model's generations, known as the Prompt Boundary Problem (PBP)<n>We present an inference-time method to convert any autore LM with a BPE tokenizer into a character-level or byte-level LM.<n>Our method efficiently solves the PBP and is also able to unify the vocabularies of language models with different tokenizers.
arXiv Detail & Related papers (2025-06-17T02:37:04Z)
Tokenization is Sensitive to Language Variation [14.568179478275255]
Tokenizers split texts into smaller units and might behave differently for less common linguistic forms. This might affect downstream LLM performance differently on two types of tasks. We investigate how key algorithmic design choices impact downstream models' performances.
arXiv Detail & Related papers (2025-02-21T09:58:54Z)
Decomposed Prompting: Probing Multilingual Linguistic Structure Knowledge in Large Language Models [54.58989938395976]
We introduce a decomposed prompting approach for sequence labeling tasks.<n>We test our method on the Universal Dependencies part-of-speech tagging dataset for 38 languages.
arXiv Detail & Related papers (2024-02-28T15:15:39Z)
MetricPrompt: Prompting Model as a Relevance Metric for Few-shot Text Classification [65.51149771074944]
MetricPrompt eases verbalizer design difficulty by reformulating few-shot text classification task into text pair relevance estimation task. We conduct experiments on three widely used text classification datasets across four few-shot settings. Results show that MetricPrompt outperforms manual verbalizer and other automatic verbalizer design methods across all few-shot settings.
arXiv Detail & Related papers (2023-06-15T06:51:35Z)
Frustratingly Easy Label Projection for Cross-lingual Transfer [25.398772204761215]
A few efforts have utilized a simple mark-then-translate method to jointly perform translation and projection. We present an empirical study across 57 languages and three tasks (QA, NER, and Event Extraction) to evaluate the effectiveness and limitations of both methods. Our optimized version of mark-then-translate, which we call EasyProject, is easily applied to many languages and works surprisingly well, outperforming the more complex word alignment-based methods.
arXiv Detail & Related papers (2022-11-28T18:11:48Z)
Multilingual Relation Classification via Efficient and Effective Prompting [9.119073318043952]
We present the first work on prompt-based multilingual relation classification (RC) We introduce an efficient and effective method that constructs prompts from relation triples and involves only minimal translation for the class labels. We evaluate its performance in fully supervised, few-shot and zero-shot scenarios, and analyze its effectiveness across 14 languages.
arXiv Detail & Related papers (2022-10-25T08:40:23Z)
CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation [113.99145386490639]
Cross-lingual NER can transfer knowledge between languages via aligned cross-lingual representations or machine translation results. We propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER. We adopt a multilingual labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence.
arXiv Detail & Related papers (2022-10-13T13:32:36Z)
Towards Unified Prompt Tuning for Few-shot Text Classification [47.71344780587704]
We present the Unified Prompt Tuning (UPT) framework, leading to better few-shot text classification for BERT-style models. In UPT, a novel paradigm Prompt-Options-Verbalizer is proposed for joint prompt learning across different NLP tasks. We also design a self-supervised task named Knowledge-enhanced Selective Masked Language Modeling to improve the PLM's generalization abilities.
arXiv Detail & Related papers (2022-05-11T07:40:45Z)
Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages. Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z)
Bridging the Gap between Language Models and Cross-Lingual Sequence Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks. Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages. In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap. Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z)
Zero-shot Cross-lingual Transfer of Prompt-based Tuning with a Unified Multilingual Prompt [98.26682501616024]
We propose a novel model that uses a unified prompt for all languages, called UniPrompt. The unified prompt is computation by a multilingual PLM to produce language-independent representation. Our proposed methods can significantly outperform the strong baselines across different languages.
arXiv Detail & Related papers (2022-02-23T11:57:52Z)
NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task--Next Sentence Prediction [14.912579358678212]
Using prompts to perform various downstream tasks, also known as prompt-based learning or prompt-learning, has lately gained significant success in comparison to the pre-train and fine-tune paradigm. In this paper, we attempt to accomplish several NLP tasks in a zero-shot scenario using a BERT original pre-training task abandoned by RoBERTa and other models--Next Sentence Prediction (NSP) Unlike token-level techniques, our sentence-level prompt-based method NSP-BERT does not need to fix the length of the prompt or the position to be predicted, allowing it to handle tasks such as entity linking
arXiv Detail & Related papers (2021-09-08T11:57:08Z)
FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning. During inference, the model makes predictions based on the text input in the target language and its translation in the source language. To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.