UNER: Universal Named-Entity RecognitionFramework
- URL: http://arxiv.org/abs/2010.12406v1
- Date: Fri, 23 Oct 2020 13:53:31 GMT
- Title: UNER: Universal Named-Entity RecognitionFramework
- Authors: Diego Alves, Tin Kuculo, Gabriel Amaral, Gaurish Thakkar, and Marko
Tadic
- Abstract summary: We create the first multilingual UNER corpus: the SETimesparallel corpus annotated for named-entities.
The English SETimescorpus will be annotated using existing tools and knowledge bases.
The resulting annotations will be propagated automatically to other languages within the SE-Times corpora.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce the Universal Named-Entity Recognition (UNER)framework, a
4-level classification hierarchy, and the methodology that isbeing adopted to
create the first multilingual UNER corpus: the SETimesparallel corpus annotated
for named-entities. First, the English SETimescorpus will be annotated using
existing tools and knowledge bases. Afterevaluating the resulting annotations
through crowdsourcing campaigns,they will be propagated automatically to other
languages within the SE-Times corpora. Finally, as an extrinsic evaluation, the
UNER multilin-gual dataset will be used to train and test available NER tools.
As part offuture research directions, we aim to increase the number of
languages inthe UNER corpus and to investigate possible ways of integrating
UNERwith available knowledge graphs to improve named-entity recognition.
Related papers
- Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark [39.01204607174688]
We introduce Universal NER (UNER), an open, community-driven project to develop gold-standard NER benchmarks in many languages.
UNER v1 contains 18 datasets annotated with named entities in a cross-lingual consistent schema across 12 diverse languages.
arXiv Detail & Related papers (2023-11-15T17:09:54Z) - IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named
Entity Recognition using Knowledge Bases [53.054598423181844]
We present a novel NER cascade approach comprising three steps.
We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities.
Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting.
arXiv Detail & Related papers (2023-04-20T20:30:34Z) - From Retrieval to Generation: Efficient and Effective Entity Set Expansion [23.535181796796678]
Entity Set Expansion (ESE) is a critical task aiming at expanding entities of the target semantic class described by seed entities.
Most existing ESE methods are retrieval-based frameworks that need to extract contextual features of entities and calculate the similarity between seed entities and candidate entities.
We propose Generative Entity Set Expansion (GenExpan) framework, which utilizes a generative pre-trained auto-regressive language model to accomplish ESE task.
arXiv Detail & Related papers (2023-04-07T08:09:50Z) - DualNER: A Dual-Teaching framework for Zero-shot Cross-lingual Named
Entity Recognition [27.245171237640502]
DualNER is a framework to make full use of both annotated source language corpus and unlabeled target language text.
We combine two complementary learning paradigms of NER, i.e., sequence labeling and span prediction, into a unified multi-task framework.
arXiv Detail & Related papers (2022-11-15T12:50:59Z) - CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual
Labeled Sequence Translation [113.99145386490639]
Cross-lingual NER can transfer knowledge between languages via aligned cross-lingual representations or machine translation results.
We propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER.
We adopt a multilingual labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence.
arXiv Detail & Related papers (2022-10-13T13:32:36Z) - Nested Named Entity Recognition as Holistic Structure Parsing [92.8397338250383]
This work models the full nested NEs in a sentence as a holistic structure, then we propose a holistic structure parsing algorithm to disclose the entire NEs once for all.
Experiments show that our model yields promising results on widely-used benchmarks which approach or even achieve state-of-the-art.
arXiv Detail & Related papers (2022-04-17T12:48:20Z) - CUGE: A Chinese Language Understanding and Generation Evaluation
Benchmark [144.05723617401674]
General-purpose language intelligence evaluation has been a longstanding goal for natural language processing.
We argue that for general-purpose language intelligence evaluation, the benchmark itself needs to be comprehensive and systematic.
We propose CUGE, a Chinese Language Understanding and Generation Evaluation benchmark with the following features.
arXiv Detail & Related papers (2021-12-27T11:08:58Z) - DaN+: Danish Nested Named Entities and Lexical Normalization [18.755176247223616]
This paper introduces DaN+, a new multi-domain corpus and annotation guidelines for Danish nested named entities (NEs) and lexical normalization.
We empirically assess three strategies to model the two-layer Named Entity Recognition (NER) task.
Our results show that 1) the most robust strategy is multi-task learning which is rivaled by multi-label decoding, 2) BERT-based NER models are sensitive to domain shifts, and 3) in-language BERT and lexical normalization are the most beneficial on the least canonical data.
arXiv Detail & Related papers (2021-05-24T14:35:21Z) - Multilingual Autoregressive Entity Linking [49.35994386221958]
mGENRE is a sequence-to-sequence system for the Multilingual Entity Linking problem.
For a mention in a given language, mGENRE predicts the name of the target entity left-to-right, token-by-token.
We show the efficacy of our approach through extensive evaluation including experiments on three popular MEL benchmarks.
arXiv Detail & Related papers (2021-03-23T13:25:55Z) - DomBERT: Domain-oriented Language Model for Aspect-based Sentiment
Analysis [71.40586258509394]
We propose DomBERT, an extension of BERT to learn from both in-domain corpus and relevant domain corpora.
Experiments are conducted on an assortment of tasks in aspect-based sentiment analysis, demonstrating promising results.
arXiv Detail & Related papers (2020-04-28T21:07:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.