BanglaCoNER: Towards Robust Bangla Complex Named Entity Recognition
- URL: http://arxiv.org/abs/2303.09306v2
- Date: Fri, 17 Mar 2023 15:13:01 GMT
- Title: BanglaCoNER: Towards Robust Bangla Complex Named Entity Recognition
- Authors: HAZ Sameen Shahgir, Ramisa Alam, Md. Zarif Ul Alam
- Abstract summary: We present the winning solution of Bangla Complex Named Entity Recognition Challenge.
The dataset consisted of 15300 sentences for training and 800 sentences for validation, in the.conll format.
Our findings also demonstrate the efficacy of Deep Learning models such as BanglaBERT for NER in Bangla language.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Named Entity Recognition (NER) is a fundamental task in natural language
processing that involves identifying and classifying named entities in text.
But much work hasn't been done for complex named entity recognition in Bangla,
despite being the seventh most spoken language globally. CNER is a more
challenging task than traditional NER as it involves identifying and
classifying complex and compound entities, which are not common in Bangla
language. In this paper, we present the winning solution of Bangla Complex
Named Entity Recognition Challenge - addressing the CNER task on BanglaCoNER
dataset using two different approaches, namely Conditional Random Fields (CRF)
and finetuning transformer based Deep Learning models such as BanglaBERT.
The dataset consisted of 15300 sentences for training and 800 sentences for
validation, in the .conll format. Exploratory Data Analysis (EDA) on the
dataset revealed that the dataset had 7 different NER tags, with notable
presence of English words, suggesting that the dataset is synthetic and likely
a product of translation.
We experimented with a variety of feature combinations including Part of
Speech (POS) tags, word suffixes, Gazetteers, and cluster information from
embeddings, while also finetuning the BanglaBERT (large) model for NER. We
found that not all linguistic patterns are immediately apparent or even
intuitive to humans, which is why Deep Learning based models has proved to be
the more effective model in NLP, including CNER task. Our fine tuned BanglaBERT
(large) model achieves an F1 Score of 0.79 on the validation set. Overall, our
study highlights the importance of Bangla Complex Named Entity Recognition,
particularly in the context of synthetic datasets. Our findings also
demonstrate the efficacy of Deep Learning models such as BanglaBERT for NER in
Bangla language.
Related papers
- ANCHOLIK-NER: A Benchmark Dataset for Bangla Regional Named Entity Recognition [0.8025340896297104]
The dataset has around 10,443 sentences, 3,481 sentences per region.
The data was collected from two publicly available datasets and through web scraping from various online newspapers, articles.
The dataset is structured into separate subsets for each region and is available both in CSV format.
arXiv Detail & Related papers (2025-02-16T16:59:10Z) - TriNER: A Series of Named Entity Recognition Models For Hindi, Bengali & Marathi [0.0]
This paper details our work to build a multilingual NER model for the three most spoken languages in India - Hindi, Bengali & Marathi.
We train a custom transformer model and fine tune a few pretrained models, achieving an F1 Score of 92.11 for a total of 6 entity groups.
arXiv Detail & Related papers (2025-02-06T17:37:36Z) - "I've Heard of You!": Generate Spoken Named Entity Recognition Data for Unseen Entities [59.22329574700317]
Spoken named entity recognition (NER) aims to identify named entities from speech.
New named entities appear every day, however, annotating their Spoken NER data is costly.
We propose a method for generating Spoken NER data based on a named entity dictionary (NED) to reduce costs.
arXiv Detail & Related papers (2024-12-26T07:43:18Z) - In-Context Learning for Few-Shot Nested Named Entity Recognition [53.55310639969833]
We introduce an effective and innovative ICL framework for the setting of few-shot nested NER.
We improve the ICL prompt by devising a novel example demonstration selection mechanism, EnDe retriever.
In EnDe retriever, we employ contrastive learning to perform three types of representation learning, in terms of semantic similarity, boundary similarity, and label similarity.
arXiv Detail & Related papers (2024-02-02T06:57:53Z) - On Significance of Subword tokenization for Low Resource and Efficient
Named Entity Recognition: A case study in Marathi [1.6383036433216434]
We focus on NER for low-resource language and present our case study in the context of the Indian language Marathi.
We propose a hybrid approach for efficient NER by integrating a BERT-based subword tokenizer into vanilla CNN/LSTM models.
We show that this simple approach of replacing a traditional word-based tokenizer with a BERT-tokenizer brings the accuracy of vanilla single-layer models closer to that of deep pre-trained models like BERT.
arXiv Detail & Related papers (2023-12-03T06:53:53Z) - Named Entity Recognition via Machine Reading Comprehension: A Multi-Task
Learning Approach [50.12455129619845]
Named Entity Recognition (NER) aims to extract and classify entity mentions in the text into pre-defined types.
We propose to incorporate the label dependencies among entity types into a multi-task learning framework for better MRC-based NER.
arXiv Detail & Related papers (2023-09-20T03:15:05Z) - IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named
Entity Recognition using Knowledge Bases [53.054598423181844]
We present a novel NER cascade approach comprising three steps.
We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities.
Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting.
arXiv Detail & Related papers (2023-04-20T20:30:34Z) - Nested Named Entity Recognition as Holistic Structure Parsing [92.8397338250383]
This work models the full nested NEs in a sentence as a holistic structure, then we propose a holistic structure parsing algorithm to disclose the entire NEs once for all.
Experiments show that our model yields promising results on widely-used benchmarks which approach or even achieve state-of-the-art.
arXiv Detail & Related papers (2022-04-17T12:48:20Z) - Unified Named Entity Recognition as Word-Word Relation Classification [25.801945832005504]
We present a novel alternative by modeling the unified NER as word-word relation classification, namely W2NER.
The architecture resolves the kernel bottleneck of unified NER by effectively modeling the neighboring relations between entity words.
Based on the W2NER scheme we develop a neural framework, in which the unified NER is modeled as a 2D grid of word pairs.
arXiv Detail & Related papers (2021-12-19T06:11:07Z) - An Open-Source Dataset and A Multi-Task Model for Malay Named Entity
Recognition [3.511753382329252]
We build a Malay NER dataset (MYNER) comprising 28,991 sentences (over 384 thousand tokens)
An auxiliary task, boundary detection, is introduced to improve NER training in both explicit and implicit ways.
arXiv Detail & Related papers (2021-09-03T03:29:25Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.