UM6P-CS at SemEval-2022 Task 11: Enhancing Multilingual and Code-Mixed
Complex Named Entity Recognition via Pseudo Labels using Multilingual
Transformer
- URL: http://arxiv.org/abs/2204.13515v1
- Date: Thu, 28 Apr 2022 14:07:06 GMT
- Title: UM6P-CS at SemEval-2022 Task 11: Enhancing Multilingual and Code-Mixed
Complex Named Entity Recognition via Pseudo Labels using Multilingual
Transformer
- Authors: Abdellah El Mekki and Abdelkader El Mahdaouy and Mohammed Akallouch
and Ismail Berrada and Ahmed Khoumsi
- Abstract summary: We introduce our submitted system to the Multilingual Complex Named Entity Recognition (MultiCoNER) shared task.
We approach the complex NER for multilingual and code-mixed queries, by relying on the contextualized representation provided by the multilingual Transformer XLM-RoBERTa.
Our proposed system is ranked 6th and 8th in the multilingual and code-mixed MultiCoNER's tracks respectively.
- Score: 7.270980742378389
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Building real-world complex Named Entity Recognition (NER) systems is a
challenging task. This is due to the complexity and ambiguity of named entities
that appear in various contexts such as short input sentences, emerging
entities, and complex entities. Besides, real-world queries are mostly
malformed, as they can be code-mixed or multilingual, among other scenarios. In
this paper, we introduce our submitted system to the Multilingual Complex Named
Entity Recognition (MultiCoNER) shared task. We approach the complex NER for
multilingual and code-mixed queries, by relying on the contextualized
representation provided by the multilingual Transformer XLM-RoBERTa. In
addition to the CRF-based token classification layer, we incorporate a span
classification loss to recognize named entities spans. Furthermore, we use a
self-training mechanism to generate weakly-annotated data from a large
unlabeled dataset. Our proposed system is ranked 6th and 8th in the
multilingual and code-mixed MultiCoNER's tracks respectively.
Related papers
- SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding [55.48936731641802]
We present the SRFUND, a hierarchically structured multi-task form understanding benchmark.
SRFUND provides refined annotations on top of the original FUNSD and XFUND datasets.
The dataset includes eight languages including English, Chinese, Japanese, German, French, Spanish, Italian, and Portuguese.
arXiv Detail & Related papers (2024-06-13T02:35:55Z) - Named Entity Recognition via Machine Reading Comprehension: A Multi-Task
Learning Approach [50.12455129619845]
Named Entity Recognition (NER) aims to extract and classify entity mentions in the text into pre-defined types.
We propose to incorporate the label dependencies among entity types into a multi-task learning framework for better MRC-based NER.
arXiv Detail & Related papers (2023-09-20T03:15:05Z) - mCL-NER: Cross-Lingual Named Entity Recognition via Multi-view
Contrastive Learning [54.523172171533645]
Cross-lingual named entity recognition (CrossNER) faces challenges stemming from uneven performance due to the scarcity of multilingual corpora.
We propose Multi-view Contrastive Learning for Cross-lingual Named Entity Recognition (mCL-NER)
Our experiments on the XTREME benchmark, spanning 40 languages, demonstrate the superiority of mCL-NER over prior data-driven and model-based approaches.
arXiv Detail & Related papers (2023-08-17T16:02:29Z) - ACLM: A Selective-Denoising based Generative Data Augmentation Approach
for Low-Resource Complex NER [47.32935969127478]
We present ACLM Attention-map aware keyword selection for Conditional Language Model fine-tuning.
ACLM alleviates the context-entity mismatch issue, a problem existing NER data augmentation techniques suffer from.
We demonstrate the effectiveness of ACLM both qualitatively and quantitatively on monolingual, cross-lingual, and multilingual complex NER.
arXiv Detail & Related papers (2023-06-01T17:33:04Z) - DAMO-NLP at SemEval-2023 Task 2: A Unified Retrieval-augmented System
for Multilingual Named Entity Recognition [94.90258603217008]
The MultiCoNER RNum2 shared task aims to tackle multilingual named entity recognition (NER) in fine-grained and noisy scenarios.
Previous top systems in the MultiCoNER RNum1 either incorporate the knowledge bases or gazetteers.
We propose a unified retrieval-augmented system (U-RaNER) for fine-grained multilingual NER.
arXiv Detail & Related papers (2023-05-05T16:59:26Z) - MultiCoNER: A Large-scale Multilingual dataset for Complex Named Entity
Recognition [15.805414696789796]
We present MultiCoNER, a large multilingual dataset for Named Entity Recognition that covers 3 domains (Wiki sentences, questions, and search queries) across 11 languages.
This dataset is designed to represent contemporary challenges in NER, including low-context scenarios.
arXiv Detail & Related papers (2022-08-30T20:45:54Z) - CMNEROne at SemEval-2022 Task 11: Code-Mixed Named Entity Recognition by
leveraging multilingual data [7.538482310185133]
This paper addresses the submission of team CMNEROne to the SEMEVAL 2022 shared task 11 MultiCoNER.
The Code-mixed NER task aimed to identify named entities on the code-mixed dataset.
We achieved a weighted average F1 score of 0.7044, i.e., 6% greater than the baseline.
arXiv Detail & Related papers (2022-06-15T06:33:13Z) - USTC-NELSLIP at SemEval-2022 Task 11: Gazetteer-Adapted Integration
Network for Multilingual Complex Named Entity Recognition [41.26523047041553]
This paper describes the system developed by the USTC-NELSLIP team for SemEval-2022 Task 11 Multilingual Complex Named Entities Recognition (MultiCoNER)
We propose a gazetteer-adapted integration network (GAIN) to improve the performance of language models for recognizing complex named entities.
arXiv Detail & Related papers (2022-03-07T09:05:37Z) - DAMO-NLP at SemEval-2022 Task 11: A Knowledge-based System for
Multilingual Named Entity Recognition [94.1865071914727]
MultiCoNER aims at detecting semantically ambiguous named entities in short and low-context settings for multiple languages.
Our team DAMO-NLP proposes a knowledge-based system, where we build a multilingual knowledge base based on Wikipedia.
Given an input sentence, our system effectively retrieves related contexts from the knowledge base.
Our system wins 10 out of 13 tracks in the MultiCoNER shared task.
arXiv Detail & Related papers (2022-03-01T15:29:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.