NEREL: A Russian Dataset with Nested Named Entities and Relations
- URL: http://arxiv.org/abs/2108.13112v1
- Date: Mon, 30 Aug 2021 10:40:20 GMT
- Title: NEREL: A Russian Dataset with Nested Named Entities and Relations
- Authors: Natalia Loukachevitch and Ekaterina Artemova and Tatiana Batura and
Pavel Braslavski and Ilia Denisov and Vladimir Ivanov and Suresh Manandhar
and Alexander Pugachev and Elena Tutubalina
- Abstract summary: We present NEREL, a Russian dataset for named entity recognition and relation extraction.
It contains 56K annotated named entities and 39K annotated relations.
- Score: 55.69103749079697
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present NEREL, a Russian dataset for named entity
recognition and relation extraction. NEREL is significantly larger than
existing Russian datasets: to date it contains 56K annotated named entities and
39K annotated relations. Its important difference from previous datasets is
annotation of nested named entities, as well as relations within nested
entities and at the discourse level. NEREL can facilitate development of novel
models that can extract relations between nested named entities, as well as
relations on both sentence and document levels. NEREL also contains the
annotation of events involving named entities and their roles in the events.
The NEREL collection is available via https://github.com/nerel-ds/NEREL.
Related papers
- Composited-Nested-Learning with Data Augmentation for Nested Named Entity Recognition [5.188242370198818]
Nested Named Entity Recognition (NNER) focuses on addressing overlapped entity recognition.
Data augmentation is an effective approach to address the insufficient annotated corpus.
We propose Composited-Nested-Label Classification (CNLC) in which constituents are combined by nested-word and nested-label, to model nested entities.
arXiv Detail & Related papers (2024-06-18T16:46:18Z) - Joint Entity and Relation Extraction with Span Pruning and Hypergraph
Neural Networks [58.43972540643903]
We propose HyperGraph neural network for ERE ($hgnn$), which is built upon the PL-marker (a state-of-the-art marker-based pipleline model)
To alleviate error propagation,we use a high-recall pruner mechanism to transfer the burden of entity identification and labeling from the NER module to the joint module of our model.
Experiments on three widely used benchmarks for ERE task show significant improvements over the previous state-of-the-art PL-marker.
arXiv Detail & Related papers (2023-10-26T08:36:39Z) - Named Entity Recognition via Machine Reading Comprehension: A Multi-Task
Learning Approach [50.12455129619845]
Named Entity Recognition (NER) aims to extract and classify entity mentions in the text into pre-defined types.
We propose to incorporate the label dependencies among entity types into a multi-task learning framework for better MRC-based NER.
arXiv Detail & Related papers (2023-09-20T03:15:05Z) - NEREL-BIO: A Dataset of Biomedical Abstracts Annotated with Nested Named
Entities [7.713462279125201]
This paper describes NEREL-BIO -- an annotation scheme and corpus of PubMed abstracts in Russian and smaller number of abstracts in English.
NEREL-BIO extends the general domain dataset NEREL by introducing domain-specific entity types.
NEREL-BIO provides annotation for nested named entities as an extension of the scheme employed for NEREL.
arXiv Detail & Related papers (2022-10-21T12:28:43Z) - An Embarrassingly Easy but Strong Baseline for Nested Named Entity
Recognition [55.080101447586635]
We propose using Conal Neural Network (CNN) to model spatial relations in the score matrix.
Our model surpasses several recently proposed methods with the same pre-trained encoders.
arXiv Detail & Related papers (2022-08-09T04:33:46Z) - AsNER -- Annotated Dataset and Baseline for Assamese Named Entity
recognition [7.252817150901275]
The proposed NER dataset is likely to be a significant resource for deep neural based Assamese language processing.
We benchmark the dataset by training NER models and evaluating using state-of-the-art architectures for supervised named entity recognition.
The highest F1-score among all baselines achieves an accuracy of 80.69% when using MuRIL as a word embedding method.
arXiv Detail & Related papers (2022-07-07T16:45:55Z) - HiNER: A Large Hindi Named Entity Recognition Dataset [29.300418937509317]
This paper releases a standard-abiding Hindi NER dataset containing 109,146 sentences and 2,220,856 tokens, annotated with 11 tags.
The statistics of tag-set in our dataset show a healthy per-tag distribution, especially for prominent classes like Person, Location and Organisation.
Our dataset helps achieve a weighted F1 score of 88.78 with all the tags and 92.22 when we collapse the tag-set, as discussed in the paper.
arXiv Detail & Related papers (2022-04-28T19:14:21Z) - Trigger-GNN: A Trigger-Based Graph Neural Network for Nested Named
Entity Recognition [5.9049664765234295]
We propose a trigger-based graph neural network (Trigger-GNN) to leverage the nested NER.
It obtains the complementary annotation embeddings through entity trigger encoding and semantic matching.
It helps the model to learn and generalize more efficiently and cost-effectively.
arXiv Detail & Related papers (2022-04-12T04:15:39Z) - MobIE: A German Dataset for Named Entity Recognition, Entity Linking and
Relation Extraction in the Mobility Domain [76.21775236904185]
dataset consists of 3,232 social media texts and traffic reports with 91K tokens, and contains 20.5K annotated entities.
A subset of the dataset is human-annotated with seven mobility-related, n-ary relation types.
To the best of our knowledge, this is the first German-language dataset that combines annotations for NER, EL and RE.
arXiv Detail & Related papers (2021-08-16T08:21:50Z) - Autoregressive Entity Retrieval [55.38027440347138]
Entities are at the center of how we represent and aggregate knowledge.
The ability to retrieve such entities given a query is fundamental for knowledge-intensive tasks such as entity linking and open-domain question answering.
We propose GENRE, the first system that retrieves entities by generating their unique names, left to right, token-by-token in an autoregressive fashion.
arXiv Detail & Related papers (2020-10-02T10:13:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.