An Open-Source Dataset and A Multi-Task Model for Malay Named Entity
Recognition
- URL: http://arxiv.org/abs/2109.01293v1
- Date: Fri, 3 Sep 2021 03:29:25 GMT
- Title: An Open-Source Dataset and A Multi-Task Model for Malay Named Entity
Recognition
- Authors: Yingwen Fu and Nankai Lin and Zhihe Yang and Shengyi Jiang
- Abstract summary: We build a Malay NER dataset (MYNER) comprising 28,991 sentences (over 384 thousand tokens)
An auxiliary task, boundary detection, is introduced to improve NER training in both explicit and implicit ways.
- Score: 3.511753382329252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Named entity recognition (NER) is a fundamental task of natural language
processing (NLP). However, most state-of-the-art research is mainly oriented to
high-resource languages such as English and has not been widely applied to
low-resource languages. In Malay language, relevant NER resources are limited.
In this work, we propose a dataset construction framework, which is based on
labeled datasets of homologous languages and iterative optimization, to build a
Malay NER dataset (MYNER) comprising 28,991 sentences (over 384 thousand
tokens). Additionally, to better integrate boundary information for NER, we
propose a multi-task (MT) model with a bidirectional revision (Bi-revision)
mechanism for Malay NER task. Specifically, an auxiliary task, boundary
detection, is introduced to improve NER training in both explicit and implicit
ways. Furthermore, a gated ignoring mechanism is proposed to conduct
conditional label transfer and alleviate error propagation by the auxiliary
task. Experimental results demonstrate that our model achieves comparable
results over baselines on MYNER. The dataset and the model in this paper would
be publicly released as a benchmark dataset.
Related papers
- GEIC: Universal and Multilingual Named Entity Recognition with Large Language Models [7.714969840571947]
We introduce the task of generation-based extraction and in-context classification (GEIC)
We then propose CascadeNER, a universal and multilingual GEIC framework for few-shot and zero-shot NER.
We also introduce AnythingNER, the first NER dataset specifically designed for Large Language Models (LLMs)
arXiv Detail & Related papers (2024-09-17T09:32:12Z) - 2M-NER: Contrastive Learning for Multilingual and Multimodal NER with Language and Modal Fusion [9.038363543966263]
We construct a large-scale MMNER dataset with four languages (English, French, German and Spanish) and two modalities (text and image)
We introduce a new model called 2M-NER, which aligns the text and image representations using contrastive learning and integrates a multimodal collaboration module.
Our model achieves the highest F1 score in multilingual and multimodal NER tasks compared to some comparative and representative baselines.
arXiv Detail & Related papers (2024-04-26T02:34:31Z) - Named Entity Recognition via Machine Reading Comprehension: A Multi-Task
Learning Approach [50.12455129619845]
Named Entity Recognition (NER) aims to extract and classify entity mentions in the text into pre-defined types.
We propose to incorporate the label dependencies among entity types into a multi-task learning framework for better MRC-based NER.
arXiv Detail & Related papers (2023-09-20T03:15:05Z) - Unified Model Learning for Various Neural Machine Translation [63.320005222549646]
Existing machine translation (NMT) studies mainly focus on developing dataset-specific models.
We propose a versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) that works with data from different tasks.
OurNMT results in substantial improvements over dataset-specific models with significantly reduced model deployment costs.
arXiv Detail & Related papers (2023-05-04T12:21:52Z) - IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named
Entity Recognition using Knowledge Bases [53.054598423181844]
We present a novel NER cascade approach comprising three steps.
We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities.
Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting.
arXiv Detail & Related papers (2023-04-20T20:30:34Z) - CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual
Labeled Sequence Translation [113.99145386490639]
Cross-lingual NER can transfer knowledge between languages via aligned cross-lingual representations or machine translation results.
We propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER.
We adopt a multilingual labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence.
arXiv Detail & Related papers (2022-10-13T13:32:36Z) - MultiCoNER: A Large-scale Multilingual dataset for Complex Named Entity
Recognition [15.805414696789796]
We present MultiCoNER, a large multilingual dataset for Named Entity Recognition that covers 3 domains (Wiki sentences, questions, and search queries) across 11 languages.
This dataset is designed to represent contemporary challenges in NER, including low-context scenarios.
arXiv Detail & Related papers (2022-08-30T20:45:54Z) - A Dual-Contrastive Framework for Low-Resource Cross-Lingual Named Entity
Recognition [5.030581940990434]
Cross-lingual Named Entity Recognition (NER) has recently become a research hotspot because it can alleviate the data-hungry problem for low-resource languages.
In this paper, we describe our novel dual-contrastive framework ConCNER for cross-lingual NER under the scenario of limited source-language labeled data.
arXiv Detail & Related papers (2022-04-02T07:59:13Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.