ANEC: An Amharic Named Entity Corpus and Transformer Based Recognizer
- URL: http://arxiv.org/abs/2207.00785v1
- Date: Sat, 2 Jul 2022 09:50:37 GMT
- Title: ANEC: An Amharic Named Entity Corpus and Transformer Based Recognizer
- Authors: Ebrahim Chekol Jibril and A. C\"uneyd Tant\u{g}
- Abstract summary: We present an Amharic named entity recognition system based on bidirectional long short-term memory with a conditional random fields layer.
Our named entity recognition system achieves an F_1 score of 93%, which is the new state-of-the-art result for Amharic named entity recognition.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Named Entity Recognition is an information extraction task that serves as a
preprocessing step for other natural language processing tasks, such as machine
translation, information retrieval, and question answering. Named entity
recognition enables the identification of proper names as well as temporal and
numeric expressions in an open domain text. For Semitic languages such as
Arabic, Amharic, and Hebrew, the named entity recognition task is more
challenging due to the heavily inflected structure of these languages. In this
paper, we present an Amharic named entity recognition system based on
bidirectional long short-term memory with a conditional random fields layer. We
annotate a new Amharic named entity recognition dataset (8,070 sentences, which
has 182,691 tokens) and apply Synthetic Minority Over-sampling Technique to our
dataset to mitigate the imbalanced classification problem. Our named entity
recognition system achieves an F_1 score of 93%, which is the new
state-of-the-art result for Amharic named entity recognition.
Related papers
- Multicultural Name Recognition For Previously Unseen Names [65.268245109828]
This paper attempts to improve recognition of person names, a diverse category that can grow any time someone is born or changes their name.
I look at names from 103 countries to compare how well the model performs on names from different cultures.
I find that a model with combined character and word input outperforms word-only models and may improve on accuracy compared to classical NER models.
arXiv Detail & Related papers (2024-01-23T17:58:38Z) - Disambiguation of Company names via Deep Recurrent Networks [101.90357454833845]
We propose a Siamese LSTM Network approach to extract -- via supervised learning -- an embedding of company name strings.
We analyse how an Active Learning approach to prioritise the samples to be labelled leads to a more efficient overall learning pipeline.
arXiv Detail & Related papers (2023-03-07T15:07:57Z) - Label Semantics for Few Shot Named Entity Recognition [68.01364012546402]
We study the problem of few shot learning for named entity recognition.
We leverage the semantic information in the names of the labels as a way of giving the model additional signal and enriched priors.
Our model learns to match the representations of named entities computed by the first encoder with label representations computed by the second encoder.
arXiv Detail & Related papers (2022-03-16T23:21:05Z) - WCL-BBCD: A Contrastive Learning and Knowledge Graph Approach to Named
Entity Recognition [15.446770390648874]
We propose a novel named entity recognition model WCL-BBCD (Word Contrastive Learning with BERT-BiLSTM-CRF-DBpedia)
The model first trains the sentence pairs in the text, calculate similarity between words in sentence pairs by cosine similarity, and fine-tunes the BERT model used for the named entity recognition task through the similarity.
Finally, the recognition results are corrected in combination with prior knowledge such as knowledge graphs, so as to alleviate the recognition caused by word abbreviations low-rate problem.
arXiv Detail & Related papers (2022-03-14T08:29:58Z) - DAMO-NLP at SemEval-2022 Task 11: A Knowledge-based System for
Multilingual Named Entity Recognition [94.1865071914727]
MultiCoNER aims at detecting semantically ambiguous named entities in short and low-context settings for multiple languages.
Our team DAMO-NLP proposes a knowledge-based system, where we build a multilingual knowledge base based on Wikipedia.
Given an input sentence, our system effectively retrieves related contexts from the knowledge base.
Our system wins 10 out of 13 tracks in the MultiCoNER shared task.
arXiv Detail & Related papers (2022-03-01T15:29:35Z) - Investigation on Data Adaptation Techniques for Neural Named Entity
Recognition [51.88382864759973]
A common practice is to utilize large monolingual unlabeled corpora.
Another popular technique is to create synthetic data from the original labeled data.
In this work, we investigate the impact of these two methods on the performance of three different named entity recognition tasks.
arXiv Detail & Related papers (2021-10-12T11:06:03Z) - Locate and Label: A Two-stage Identifier for Nested Named Entity
Recognition [9.809157050048375]
We propose a two-stage entity identifier for named entity recognition.
First, we generate span proposals by filtering and boundary regression on the seed spans to locate the entities, and then label the boundary-adjusted span proposals with the corresponding categories.
Our method effectively utilizes the boundary information of entities and partially matched spans during training.
arXiv Detail & Related papers (2021-05-14T12:52:34Z) - Bootstrapping Named Entity Recognition in E-Commerce with Positive
Unlabeled Learning [13.790883865748004]
We present a bootstrapped positive-unlabeled learning algorithm that integrates domain-specific linguistic features to quickly and efficiently expand the seed dictionary.
The model achieves an average F1 score of 72.02% on a novel dataset of product descriptions, an improvement of 3.63% over a baseline BiLSTM classifier.
arXiv Detail & Related papers (2020-05-22T09:35:30Z) - Interpretability Analysis for Named Entity Recognition to Understand
System Predictions and How They Can Improve [49.878051587667244]
We examine the performance of several variants of LSTM-CRF architectures for named entity recognition.
We find that context representations do contribute to system performance, but that the main factor driving high performance is learning the name tokens themselves.
We enlist human annotators to evaluate the feasibility of inferring entity types from the context alone and find that, while people are not able to infer the entity type either for the majority of the errors made by the context-only system, there is some room for improvement.
arXiv Detail & Related papers (2020-04-09T14:37:12Z) - Beheshti-NER: Persian Named Entity Recognition Using BERT [0.0]
In this paper, we use the pre-trained deep bidirectional network, BERT, to make a model for named entity recognition in Persian.
Our results are 83.5 and 88.4 f1 CONLL score respectively in phrase and word level evaluation.
arXiv Detail & Related papers (2020-03-19T15:55:21Z) - Integrating Boundary Assembling into a DNN Framework for Named Entity
Recognition in Chinese Social Media Text [3.7239227834407735]
Chinese word boundaries are also entity boundaries, so named entity recognition for Chinese text can benefit from word boundary detection.
In this paper, we integrate a boundary assembling method with the state-of-the-art deep neural network model, and incorporate the updated word boundary information into a conditional random field model for named entity recognition.
Our method shows a 2% absolute improvement over previous state-of-the-art results.
arXiv Detail & Related papers (2020-02-27T04:29:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.