Soft Gazetteers for Low-Resource Named Entity Recognition
- URL: http://arxiv.org/abs/2005.01866v1
- Date: Mon, 4 May 2020 21:58:02 GMT
- Title: Soft Gazetteers for Low-Resource Named Entity Recognition
- Authors: Shruti Rijhwani, Shuyan Zhou, Graham Neubig, Jaime Carbonell
- Abstract summary: We propose a method of "soft gazetteers" that incorporates ubiquitously available information from English knowledge bases into neural named entity recognition models.
Our experiments on four low-resource languages show an average improvement of 4 points in F1 score.
- Score: 78.00856159473393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional named entity recognition models use gazetteers (lists of
entities) as features to improve performance. Although modern neural network
models do not require such hand-crafted features for strong performance, recent
work has demonstrated their utility for named entity recognition on English
data. However, designing such features for low-resource languages is
challenging, because exhaustive entity gazetteers do not exist in these
languages. To address this problem, we propose a method of "soft gazetteers"
that incorporates ubiquitously available information from English knowledge
bases, such as Wikipedia, into neural named entity recognition models through
cross-lingual entity linking. Our experiments on four low-resource languages
show an average improvement of 4 points in F1 score. Code and data are
available at https://github.com/neulab/soft-gazetteers.
Related papers
- Low-Resource Named Entity Recognition with Cross-Lingual, Character-Level Neural Conditional Random Fields [68.17213992395041]
Low-resource named entity recognition is still an open problem in NLP.
We present a transfer learning scheme, whereby we train character-level neural CRFs to predict named entities for both high-resource languages and low resource languages jointly.
arXiv Detail & Related papers (2024-04-14T23:44:49Z) - Tik-to-Tok: Translating Language Models One Token at a Time: An
Embedding Initialization Strategy for Efficient Language Adaptation [19.624330093598996]
Training monolingual language models for low and mid-resource languages is made challenging by limited and often inadequate pretraining data.
By generalizing over a word translation dictionary encompassing both the source and target languages, we map tokens from the target tokenizer to semantically similar tokens from the source language tokenizer.
We conduct experiments to convert high-resource models to mid- and low-resource languages, namely Dutch and Frisian.
arXiv Detail & Related papers (2023-10-05T11:45:29Z) - Cross-Lingual NER for Financial Transaction Data in Low-Resource
Languages [70.25418443146435]
We propose an efficient modeling framework for cross-lingual named entity recognition in semi-structured text data.
We employ two independent datasets of SMSs in English and Arabic, each carrying semi-structured banking transaction information.
With access to only 30 labeled samples, our model can generalize the recognition of merchants, amounts, and other fields from English to Arabic.
arXiv Detail & Related papers (2023-07-16T00:45:42Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - Building Low-Resource NER Models Using Non-Speaker Annotation [58.78968578460793]
Cross-lingual methods have had notable success in addressing these concerns.
We propose a complementary approach to building low-resource Named Entity Recognition (NER) models using non-speaker'' (NS) annotations.
We show that use of NS annotators produces results that are consistently on par or better than cross-lingual methods built on modern contextual representations.
arXiv Detail & Related papers (2020-06-17T03:24:38Z) - Neurals Networks for Projecting Named Entities from English to Ewondo [6.058868817939519]
We propose a new distributional representation of words to project named entities from a rich language to a low-resource one.
Although the proposed method reached appreciable results, the size of the used neural network was too large.
In this paper, we show experimentally that the same results can be obtained using a smaller neural network.
arXiv Detail & Related papers (2020-03-29T22:05:30Z) - Improving Neural Named Entity Recognition with Gazetteers [6.292153194561472]
This article describes how to generate gazetteers from the Wikidata knowledge graph as well as how to integrate the information into a neural NER system.
Experiments reveal that the approach yields performance gains in two distinct languages.
arXiv Detail & Related papers (2020-03-06T08:29:37Z) - Cross-lingual, Character-Level Neural Morphological Tagging [57.0020906265213]
We train character-level recurrent neural taggers to predict morphological taggings for high-resource languages and low-resource languages together.
Learning joint character representations among multiple related languages successfully enables knowledge transfer from the high-resource languages to the low-resource ones, improving accuracy by up to 30% over a monolingual model.
arXiv Detail & Related papers (2017-08-30T08:14:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.