mucAI at WojoodNER 2024: Arabic Named Entity Recognition with Nearest Neighbor Search
- URL: http://arxiv.org/abs/2408.03652v1
- Date: Wed, 7 Aug 2024 09:34:55 GMT
- Title: mucAI at WojoodNER 2024: Arabic Named Entity Recognition with Nearest Neighbor Search
- Authors: Ahmed Abdou, Tasneem Mohsen,
- Abstract summary: We introduce Arabic KNN-NER, our submission to the Wojood NER Shared Task 2024 (ArabicNLP 2024)
In this paper, we tackle fine-grained flat-entity recognition for Arabic text, where we identify a single main entity and possibly zero or multiple sub-entities for each word.
Our submission achieved 91% on the test set on the WojoodFine dataset, placing Arabic KNN-NER on top of the leaderboard for the shared task.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Named Entity Recognition (NER) is a task in Natural Language Processing (NLP) that aims to identify and classify entities in text into predefined categories. However, when applied to Arabic data, NER encounters unique challenges stemming from the language's rich morphological inflections, absence of capitalization cues, and spelling variants, where a single word can comprise multiple morphemes. In this paper, we introduce Arabic KNN-NER, our submission to the Wojood NER Shared Task 2024 (ArabicNLP 2024). We have participated in the shared sub-task 1 Flat NER. In this shared sub-task, we tackle fine-grained flat-entity recognition for Arabic text, where we identify a single main entity and possibly zero or multiple sub-entities for each word. Arabic KNN-NER augments the probability distribution of a fine-tuned model with another label probability distribution derived from performing a KNN search over the cached training data. Our submission achieved 91% on the test set on the WojoodFine dataset, placing Arabic KNN-NER on top of the leaderboard for the shared task.
Related papers
- Computational Approaches to Arabic-English Code-Switching [0.0]
We propose and apply state-of-the-art techniques for Modern Standard Arabic and Arabic-English NER tasks.
We have created the first annotated CS Arabic-English corpus for the NER task.
All methods showed improvements in the performance of the NER taggers on CS data.
arXiv Detail & Related papers (2024-10-17T08:20:29Z) - Mavericks at NADI 2023 Shared Task: Unravelling Regional Nuances through
Dialect Identification using Transformer-based Approach [0.0]
We highlight our methodology for subtask 1 which deals with country-level dialect identification.
The task uses the Twitter dataset (TWT-2023) that encompasses 18 dialects for the multi-class classification problem.
We achieved an F1-score of 76.65 (11th rank on the leaderboard) on the test dataset.
arXiv Detail & Related papers (2023-11-30T17:37:56Z) - NERetrieve: Dataset for Next Generation Named Entity Recognition and
Retrieval [49.827932299460514]
We argue that capabilities provided by large language models are not the end of NER research, but rather an exciting beginning.
We present three variants of the NER task, together with a dataset to support them.
We provide a large, silver-annotated corpus of 4 million paragraphs covering 500 entity types.
arXiv Detail & Related papers (2023-10-22T12:23:00Z) - Named Entity Recognition via Machine Reading Comprehension: A Multi-Task
Learning Approach [50.12455129619845]
Named Entity Recognition (NER) aims to extract and classify entity mentions in the text into pre-defined types.
We propose to incorporate the label dependencies among entity types into a multi-task learning framework for better MRC-based NER.
arXiv Detail & Related papers (2023-09-20T03:15:05Z) - DAMO-NLP at SemEval-2023 Task 2: A Unified Retrieval-augmented System
for Multilingual Named Entity Recognition [94.90258603217008]
The MultiCoNER RNum2 shared task aims to tackle multilingual named entity recognition (NER) in fine-grained and noisy scenarios.
Previous top systems in the MultiCoNER RNum1 either incorporate the knowledge bases or gazetteers.
We propose a unified retrieval-augmented system (U-RaNER) for fine-grained multilingual NER.
arXiv Detail & Related papers (2023-05-05T16:59:26Z) - IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named
Entity Recognition using Knowledge Bases [53.054598423181844]
We present a novel NER cascade approach comprising three steps.
We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities.
Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting.
arXiv Detail & Related papers (2023-04-20T20:30:34Z) - CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual
Labeled Sequence Translation [113.99145386490639]
Cross-lingual NER can transfer knowledge between languages via aligned cross-lingual representations or machine translation results.
We propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER.
We adopt a multilingual labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence.
arXiv Detail & Related papers (2022-10-13T13:32:36Z) - HiNER: A Large Hindi Named Entity Recognition Dataset [29.300418937509317]
This paper releases a standard-abiding Hindi NER dataset containing 109,146 sentences and 2,220,856 tokens, annotated with 11 tags.
The statistics of tag-set in our dataset show a healthy per-tag distribution, especially for prominent classes like Person, Location and Organisation.
Our dataset helps achieve a weighted F1 score of 88.78 with all the tags and 92.22 when we collapse the tag-set, as discussed in the paper.
arXiv Detail & Related papers (2022-04-28T19:14:21Z) - MobIE: A German Dataset for Named Entity Recognition, Entity Linking and
Relation Extraction in the Mobility Domain [76.21775236904185]
dataset consists of 3,232 social media texts and traffic reports with 91K tokens, and contains 20.5K annotated entities.
A subset of the dataset is human-annotated with seven mobility-related, n-ary relation types.
To the best of our knowledge, this is the first German-language dataset that combines annotations for NER, EL and RE.
arXiv Detail & Related papers (2021-08-16T08:21:50Z) - Arabic Dialect Identification Using BERT-Based Domain Adaptation [0.0]
Arabic is one of the most important and growing languages in the world.
With the rise of social media platforms such as Twitter, Arabic spoken dialects have become more in use.
arXiv Detail & Related papers (2020-11-13T15:52:51Z) - Multi-Dialect Arabic BERT for Country-Level Dialect Identification [1.2928709656541642]
We present the experiments conducted, and the models developed by our competing team, Mawdoo3 AI.
The dialect identification subtask provides 21,000 country-level labeled tweets covering all 21 Arab countries.
We publicly release the pre-trained language model component of our winning solution under the name of Multi-dialect-Arabic-BERT model.
arXiv Detail & Related papers (2020-07-10T21:11:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.