MobIE: A German Dataset for Named Entity Recognition, Entity Linking and
Relation Extraction in the Mobility Domain
- URL: http://arxiv.org/abs/2108.06955v1
- Date: Mon, 16 Aug 2021 08:21:50 GMT
- Title: MobIE: A German Dataset for Named Entity Recognition, Entity Linking and
Relation Extraction in the Mobility Domain
- Authors: Leonhard Hennig and Phuc Tran Truong and Aleksandra Gabryszak
- Abstract summary: dataset consists of 3,232 social media texts and traffic reports with 91K tokens, and contains 20.5K annotated entities.
A subset of the dataset is human-annotated with seven mobility-related, n-ary relation types.
To the best of our knowledge, this is the first German-language dataset that combines annotations for NER, EL and RE.
- Score: 76.21775236904185
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present MobIE, a German-language dataset, which is human-annotated with 20
coarse- and fine-grained entity types and entity linking information for
geographically linkable entities. The dataset consists of 3,232 social media
texts and traffic reports with 91K tokens, and contains 20.5K annotated
entities, 13.1K of which are linked to a knowledge base. A subset of the
dataset is human-annotated with seven mobility-related, n-ary relation types,
while the remaining documents are annotated using a weakly-supervised labeling
approach implemented with the Snorkel framework. To the best of our knowledge,
this is the first German-language dataset that combines annotations for NER, EL
and RE, and thus can be used for joint and multi-task learning of these
fundamental information extraction tasks. We make MobIE public at
https://github.com/dfki-nlp/mobie.
Related papers
- SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents [49.54155332262579]
We release a new entity and relation extraction dataset for entities related to datasets, methods, and tasks in scientific articles.
Our dataset contains 106 manually annotated full-text scientific publications with over 24k entities and 12k relations.
arXiv Detail & Related papers (2024-10-28T15:56:49Z) - Entity Disambiguation via Fusion Entity Decoding [68.77265315142296]
We propose an encoder-decoder model to disambiguate entities with more detailed entity descriptions.
We observe +1.5% improvements in end-to-end entity linking in the GERBIL benchmark compared with EntQA.
arXiv Detail & Related papers (2024-04-02T04:27:54Z) - SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 13 Languages [44.017657230247934]
We present textitSemRel, a new semantic relatedness dataset collection annotated by native speakers across 13 languages.
These languages originate from five distinct language families and are predominantly spoken in Africa and Asia.
Each instance in the SemRel datasets is a sentence pair associated with a score that represents the degree of semantic textual relatedness between the two sentences.
arXiv Detail & Related papers (2024-02-13T18:04:53Z) - Exploiting Contextual Target Attributes for Target Sentiment
Classification [53.30511968323911]
Existing PTLM-based models for TSC can be categorized into two groups: 1) fine-tuning-based models that adopt PTLM as the context encoder; 2) prompting-based models that transfer the classification task to the text/word generation task.
We present a new perspective of leveraging PTLM for TSC: simultaneously leveraging the merits of both language modeling and explicit target-context interactions via contextual target attributes.
arXiv Detail & Related papers (2023-12-21T11:45:28Z) - Navya3DSeg -- Navya 3D Semantic Segmentation Dataset & split generation
for autonomous vehicles [63.20765930558542]
3D semantic data are useful for core perception tasks such as obstacle detection and ego-vehicle localization.
We propose a new dataset, Navya 3D (Navya3DSeg), with a diverse label space corresponding to a large scale production grade operational domain.
It contains 23 labeled sequences and 25 supplementary sequences without labels, designed to explore self-supervised and semi-supervised semantic segmentation benchmarks on point clouds.
arXiv Detail & Related papers (2023-02-16T13:41:19Z) - AsNER -- Annotated Dataset and Baseline for Assamese Named Entity
recognition [7.252817150901275]
The proposed NER dataset is likely to be a significant resource for deep neural based Assamese language processing.
We benchmark the dataset by training NER models and evaluating using state-of-the-art architectures for supervised named entity recognition.
The highest F1-score among all baselines achieves an accuracy of 80.69% when using MuRIL as a word embedding method.
arXiv Detail & Related papers (2022-07-07T16:45:55Z) - AIFB-WebScience at SemEval-2022 Task 12: Relation Extraction First --
Using Relation Extraction to Identify Entities [0.0]
We present an end-to-end joint entity and relation extraction approach based on transformer-based language models.
In contrast to existing approaches, which perform entity and relation extraction in sequence, our system incorporates information from relation extraction into entity extraction.
arXiv Detail & Related papers (2022-03-10T12:19:44Z) - KazNERD: Kazakh Named Entity Recognition Dataset [5.094176584161206]
We present the development of a dataset for Kazakh named entity recognition.
The dataset was built as there is a clear need for publicly available annotated corpora in Kazakh.
The resulting dataset contains 112,702 sentences and 136,333 annotations for 25 entity classes.
arXiv Detail & Related papers (2021-11-26T10:56:19Z) - Interpretable and Low-Resource Entity Matching via Decoupling Feature
Learning from Decision Making [22.755892575582788]
Entity Matching aims at recognizing entity records that denote the same real-world object.
We propose a novel EM framework that consists of Heterogeneous Information Fusion (HIF) and Key Attribute Tree (KAT) Induction.
Our method is highly efficient and outperforms SOTA EM models in most cases.
arXiv Detail & Related papers (2021-06-08T08:27:31Z) - Cross-lingual Entity Alignment with Incidental Supervision [76.66793175159192]
We propose an incidentally supervised model, JEANS, which jointly represents multilingual KGs and text corpora in a shared embedding scheme.
Experiments on benchmark datasets show that JEANS leads to promising improvement on entity alignment with incidental supervision.
arXiv Detail & Related papers (2020-05-01T01:53:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.