Cross-lingual Extended Named Entity Classification of Wikipedia Articles
- URL: http://arxiv.org/abs/2010.03424v2
- Date: Sat, 17 Oct 2020 09:06:42 GMT
- Title: Cross-lingual Extended Named Entity Classification of Wikipedia Articles
- Authors: The Viet Bui, Phuong Le-Hong
- Abstract summary: This paper describes our method to solving the problem and discusses the official results.
We propose a three-stage approach including multilingual model pre-training, monolingual model fine-tuning and cross-lingual voting.
Our system is able to achieve the best scores for 25 out of 30 languages; and its accuracy gaps to the best performing systems of the other five languages are relatively small.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The FPT.AI team participated in the SHINRA2020-ML subtask of the NTCIR-15
SHINRA task. This paper describes our method to solving the problem and
discusses the official results. Our method focuses on learning cross-lingual
representations, both on the word level and document level for page
classification. We propose a three-stage approach including multilingual model
pre-training, monolingual model fine-tuning and cross-lingual voting. Our
system is able to achieve the best scores for 25 out of 30 languages; and its
accuracy gaps to the best performing systems of the other five languages are
relatively small.
Related papers
- T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text
Classification [50.675552118811]
Cross-lingual text classification is typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest.
We propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages.
arXiv Detail & Related papers (2023-06-08T07:33:22Z) - Team QUST at SemEval-2023 Task 3: A Comprehensive Study of Monolingual
and Multilingual Approaches for Detecting Online News Genre, Framing and
Persuasion Techniques [0.030458514384586396]
This paper describes the participation of team QUST in the SemEval2023 task 3.
The monolingual models are first evaluated with the under-sampling of the majority classes.
The pre-trained multilingual model is fine-tuned with a combination of the class weights and the sample weights.
arXiv Detail & Related papers (2023-04-09T08:14:01Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z) - SheffieldVeraAI at SemEval-2023 Task 3: Mono and multilingual approaches
for news genre, topic and persuasion technique classification [3.503844033591702]
This paper describes our approach for SemEval-2023 Task 3: Detecting the category, the framing, and the persuasion techniques in online news in a multi-lingual setup.
arXiv Detail & Related papers (2023-03-16T15:54:23Z) - Advancing Multilingual Pre-training: TRIP Triangular Document-level
Pre-training for Multilingual Language Models [107.83158521848372]
We present textbfTriangular Document-level textbfPre-training (textbfTRIP), which is the first in the field to accelerate the conventional monolingual and bilingual objectives into a trilingual objective with a novel method called Grafting.
TRIP achieves several strong state-of-the-art (SOTA) scores on three multilingual document-level machine translation benchmarks and one cross-lingual abstractive summarization benchmark, including consistent improvements by up to 3.11 d-BLEU points and 8.9 ROUGE-L points.
arXiv Detail & Related papers (2022-12-15T12:14:25Z) - Multilingual Relation Classification via Efficient and Effective
Prompting [9.119073318043952]
We present the first work on prompt-based multilingual relation classification (RC)
We introduce an efficient and effective method that constructs prompts from relation triples and involves only minimal translation for the class labels.
We evaluate its performance in fully supervised, few-shot and zero-shot scenarios, and analyze its effectiveness across 14 languages.
arXiv Detail & Related papers (2022-10-25T08:40:23Z) - Persian Natural Language Inference: A Meta-learning approach [6.832341432995628]
This paper proposes a meta-learning approach for inferring natural language in Persian.
We evaluate the proposed method using four languages and an auxiliary task.
arXiv Detail & Related papers (2022-05-18T06:51:58Z) - Few-shot Learning with Multilingual Language Models [66.49496434282564]
We train multilingual autoregressive language models on a balanced corpus covering a diverse set of languages.
Our largest model sets new state of the art in few-shot learning in more than 20 representative languages.
We present a detailed analysis of where the model succeeds and fails, showing in particular that it enables cross-lingual in-context learning.
arXiv Detail & Related papers (2021-12-20T16:52:35Z) - Exploring Teacher-Student Learning Approach for Multi-lingual
Speech-to-Intent Classification [73.5497360800395]
We develop an end-to-end system that supports multiple languages.
We exploit knowledge from a pre-trained multi-lingual natural language processing model.
arXiv Detail & Related papers (2021-09-28T04:43:11Z) - Probing Multilingual Language Models for Discourse [0.0]
We find that the XLM-RoBERTa family of models consistently show the best performance.
Our results also indicate that model distillation may hurt the ability of cross-lingual transfer of sentence representations.
arXiv Detail & Related papers (2021-06-09T06:34:21Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.