Improving Neural Named Entity Recognition with Gazetteers
- URL: http://arxiv.org/abs/2003.03072v1
- Date: Fri, 6 Mar 2020 08:29:37 GMT
- Title: Improving Neural Named Entity Recognition with Gazetteers
- Authors: Chan Hee Song, Dawn Lawrie, Tim Finin, James Mayfield
- Abstract summary: This article describes how to generate gazetteers from the Wikidata knowledge graph as well as how to integrate the information into a neural NER system.
Experiments reveal that the approach yields performance gains in two distinct languages.
- Score: 6.292153194561472
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The goal of this work is to improve the performance of a neural named entity
recognition system by adding input features that indicate a word is part of a
name included in a gazetteer. This article describes how to generate gazetteers
from the Wikidata knowledge graph as well as how to integrate the information
into a neural NER system. Experiments reveal that the approach yields
performance gains in two distinct languages: a high-resource, word-based
language, English and a high-resource, character-based language, Chinese.
Experiments were also performed in a low-resource language, Russian on a newly
annotated Russian NER corpus from Reddit tagged with four core types and twelve
extended types. This article reports a baseline score. It is a longer version
of a paper in the 33rd FLAIRS conference (Song et al. 2020).
Related papers
- Presence or Absence: Are Unknown Word Usages in Dictionaries? [6.185216877366987]
We evaluate our system in the AXOLOTL-24 shared task for Finnish, Russian and German languages.
We use a graph-based clustering approach to predict mappings between unknown word usages and dictionary entries.
Our system ranks first in Finnish and German, and ranks second in Russian on the Subtask 2 testphase leaderboard.
arXiv Detail & Related papers (2024-06-02T07:57:45Z) - A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding.
There is no publicly available NLI corpus for the Romanian language.
We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z) - Low-Resource Named Entity Recognition with Cross-Lingual, Character-Level Neural Conditional Random Fields [68.17213992395041]
Low-resource named entity recognition is still an open problem in NLP.
We present a transfer learning scheme, whereby we train character-level neural CRFs to predict named entities for both high-resource languages and low resource languages jointly.
arXiv Detail & Related papers (2024-04-14T23:44:49Z) - NusaWrites: Constructing High-Quality Corpora for Underrepresented and
Extremely Low-Resource Languages [54.808217147579036]
We conduct a case study on Indonesian local languages.
We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets.
Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content.
arXiv Detail & Related papers (2023-09-19T14:42:33Z) - CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual
Labeled Sequence Translation [113.99145386490639]
Cross-lingual NER can transfer knowledge between languages via aligned cross-lingual representations or machine translation results.
We propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER.
We adopt a multilingual labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence.
arXiv Detail & Related papers (2022-10-13T13:32:36Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - Classification of Handwritten Names of Cities and Handwritten Text
Recognition using Various Deep Learning Models [0.0]
We have tried to describe various approaches and achievements of recent years in the development of handwritten recognition models.
The first model uses deep convolutional neural networks (CNNs) for feature extraction and a fully connected multilayer perceptron neural network (MLP) for word classification.
The second model, called SimpleHTR, uses CNN and recurrent neural network (RNN) layers to extract information from images.
arXiv Detail & Related papers (2021-02-09T13:34:16Z) - Soft Gazetteers for Low-Resource Named Entity Recognition [78.00856159473393]
We propose a method of "soft gazetteers" that incorporates ubiquitously available information from English knowledge bases into neural named entity recognition models.
Our experiments on four low-resource languages show an average improvement of 4 points in F1 score.
arXiv Detail & Related papers (2020-05-04T21:58:02Z) - Investigating Language Impact in Bilingual Approaches for Computational
Language Documentation [28.838960956506018]
This paper investigates how the choice of translation language affects the posterior documentation work.
We create 56 bilingual pairs that we apply to the task of low-resource unsupervised word segmentation and alignment.
Our results suggest that incorporating clues into the neural models' input representation increases their translation and alignment quality.
arXiv Detail & Related papers (2020-03-30T10:30:34Z) - Classification of Chinese Handwritten Numbers with Labeled Projective
Dictionary Pair Learning [1.8594711725515674]
We design class-specific dictionaries incorporating three factors: discriminability, sparsity and classification error.
We adopt a new feature space, i.e., histogram of oriented gradients (HOG), to generate the dictionary atoms.
Results demonstrated enhanced classification performance $(sim98%)$ compared to state-of-the-art deep learning techniques.
arXiv Detail & Related papers (2020-03-26T01:43:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.