Related papers: Improving Neural Named Entity Recognition with Gazetteers

Improving Neural Named Entity Recognition with Gazetteers

URL: http://arxiv.org/abs/2003.03072v1
Date: Fri, 6 Mar 2020 08:29:37 GMT
Title: Improving Neural Named Entity Recognition with Gazetteers
Authors: Chan Hee Song, Dawn Lawrie, Tim Finin, James Mayfield
Abstract summary: This article describes how to generate gazetteers from the Wikidata knowledge graph as well as how to integrate the information into a neural NER system. Experiments reveal that the approach yields performance gains in two distinct languages.
Score: 6.292153194561472
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The goal of this work is to improve the performance of a neural named entity recognition system by adding input features that indicate a word is part of a name included in a gazetteer. This article describes how to generate gazetteers from the Wikidata knowledge graph as well as how to integrate the information into a neural NER system. Experiments reveal that the approach yields performance gains in two distinct languages: a high-resource, word-based language, English and a high-resource, character-based language, Chinese. Experiments were also performed in a low-resource language, Russian on a newly annotated Russian NER corpus from Reddit tagged with four core types and twelve extended types. This article reports a baseline score. It is a longer version of a paper in the 33rd FLAIRS conference (Song et al. 2020).

Related papers

Revisiting Projection-based Data Transfer for Cross-Lingual Named Entity Recognition in Low-Resource Languages [8.612181075294327]
We show that the data-based cross-lingual transfer method is an effective technique for crosslingual NER. We present a novel formalized projection approach of matching source entities with extracted target candidates. These findings highlight the robustness of projection-based data transfer as an alternative to model-based methods for crosslingual named entity recognition in lowresource languages.
arXiv Detail & Related papers (2025-01-30T21:00:47Z)
Presence or Absence: Are Unknown Word Usages in Dictionaries? [6.185216877366987]
We evaluate our system in the AXOLOTL-24 shared task for Finnish, Russian and German languages. We use a graph-based clustering approach to predict mappings between unknown word usages and dictionary entries. Our system ranks first in Finnish and German, and ranks second in Russian on the Subtask 2 testphase leaderboard.
arXiv Detail & Related papers (2024-06-02T07:57:45Z)
A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding. There is no publicly available NLI corpus for the Romanian language. We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z)
Low-Resource Named Entity Recognition with Cross-Lingual, Character-Level Neural Conditional Random Fields [68.17213992395041]
Low-resource named entity recognition is still an open problem in NLP. We present a transfer learning scheme, whereby we train character-level neural CRFs to predict named entities for both high-resource languages and low resource languages jointly.
arXiv Detail & Related papers (2024-04-14T23:44:49Z)
NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages [54.808217147579036]
We conduct a case study on Indonesian local languages. We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets. Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content.
arXiv Detail & Related papers (2023-09-19T14:42:33Z)
CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation [113.99145386490639]
Cross-lingual NER can transfer knowledge between languages via aligned cross-lingual representations or machine translation results. We propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER. We adopt a multilingual labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence.
arXiv Detail & Related papers (2022-10-13T13:32:36Z)
Reinforced Iterative Knowledge Distillation for Cross-Lingual Named Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources. Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages. We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z)
Classification of Handwritten Names of Cities and Handwritten Text Recognition using Various Deep Learning Models [0.0]
We have tried to describe various approaches and achievements of recent years in the development of handwritten recognition models. The first model uses deep convolutional neural networks (CNNs) for feature extraction and a fully connected multilayer perceptron neural network (MLP) for word classification. The second model, called SimpleHTR, uses CNN and recurrent neural network (RNN) layers to extract information from images.
arXiv Detail & Related papers (2021-02-09T13:34:16Z)
Soft Gazetteers for Low-Resource Named Entity Recognition [78.00856159473393]
We propose a method of "soft gazetteers" that incorporates ubiquitously available information from English knowledge bases into neural named entity recognition models. Our experiments on four low-resource languages show an average improvement of 4 points in F1 score.
arXiv Detail & Related papers (2020-05-04T21:58:02Z)
Investigating Language Impact in Bilingual Approaches for Computational Language Documentation [28.838960956506018]
This paper investigates how the choice of translation language affects the posterior documentation work. We create 56 bilingual pairs that we apply to the task of low-resource unsupervised word segmentation and alignment. Our results suggest that incorporating clues into the neural models' input representation increases their translation and alignment quality.
arXiv Detail & Related papers (2020-03-30T10:30:34Z)
Classification of Chinese Handwritten Numbers with Labeled Projective Dictionary Pair Learning [1.8594711725515674]
We design class-specific dictionaries incorporating three factors: discriminability, sparsity and classification error. We adopt a new feature space, i.e., histogram of oriented gradients (HOG), to generate the dictionary atoms. Results demonstrated enhanced classification performance $(sim98%)$ compared to state-of-the-art deep learning techniques.
arXiv Detail & Related papers (2020-03-26T01:43:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.