RuREBus: a Case Study of Joint Named Entity Recognition and Relation
Extraction from e-Government Domain
- URL: http://arxiv.org/abs/2010.15939v1
- Date: Thu, 29 Oct 2020 20:56:15 GMT
- Title: RuREBus: a Case Study of Joint Named Entity Recognition and Relation
Extraction from e-Government Domain
- Authors: Vitaly Ivanin and Ekaterina Artemova and Tatiana Batura and Vladimir
Ivanov and Veronika Sarkisyan and Elena Tutubalina and Ivan Smurov
- Abstract summary: We show-case an application of information extraction methods, such as named entity recognition (NER) and relation extraction (RE) to a novel corpus, consisting of documents, issued by a state agency.
The main challenges of this corpus are: 1) the annotation scheme differs greatly from the one used for the general domain corpora, and 2) the documents are written in a language other than English.
- Score: 7.6462329126769815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We show-case an application of information extraction methods, such as named
entity recognition (NER) and relation extraction (RE) to a novel corpus,
consisting of documents, issued by a state agency. The main challenges of this
corpus are: 1) the annotation scheme differs greatly from the one used for the
general domain corpora, and 2) the documents are written in a language other
than English. Unlike expectations, the state-of-the-art transformer-based
models show modest performance for both tasks, either when approached
sequentially, or in an end-to-end fashion. Our experiments have demonstrated
that fine-tuning on a large unlabeled corpora does not automatically yield
significant improvement and thus we may conclude that more sophisticated
strategies of leveraging unlabelled texts are demanded. In this paper, we
describe the whole developed pipeline, starting from text annotation, baseline
development, and designing a shared task in hopes of improving the baseline.
Eventually, we realize that the current NER and RE technologies are far from
being mature and do not overcome so far challenges like ours.
Related papers
- On the Robustness of Document-Level Relation Extraction Models to Entity Name Variations [33.56352555780006]
We investigate the robustness of DocRE models to entity name variations in this work.
We propose a principled pipeline to generate entity-renamed documents by replacing the original entity names with names from Wikidata.
Experimental results show that both three representative DocRE models and two in-context learned large language models consistently lack sufficient robustness to entity name variations.
arXiv Detail & Related papers (2024-06-11T16:51:14Z) - DocTr: Document Transformer for Structured Information Extraction in
Documents [36.1145541816468]
We present a new formulation for structured information extraction from visually rich documents.
It aims to address the limitations of existing IOB tagging or graph-based formulations.
We represent an entity as an anchor word and a bounding box, and represent entity linking as the association between anchor words.
arXiv Detail & Related papers (2023-07-16T02:59:30Z) - Understand the Dynamic World: An End-to-End Knowledge Informed Framework
for Open Domain Entity State Tracking [15.421012879083463]
Open domain entity state tracking aims to predict reasonable state changes of entities (i.e., [attribute] of [entity] was [before_state] and [after_state] afterwards) given the action descriptions.
It's challenging as the model needs to predict an arbitrary number of entity state changes caused by the action while most of the entities are implicitly relevant to the actions and their attributes as well as states are from open vocabularies.
We propose a novel end-to-end Knowledge Informed framework for open domain Entity State Tracking, namely KIEST, which explicitly retrieves the relevant entities and attributes from
arXiv Detail & Related papers (2023-04-26T22:45:30Z) - Enriching Relation Extraction with OpenIE [70.52564277675056]
Relation extraction (RE) is a sub-discipline of information extraction (IE)
In this work, we explore how recent approaches for open information extraction (OpenIE) may help to improve the task of RE.
Our experiments over two annotated corpora, KnowledgeNet and FewRel, demonstrate the improved accuracy of our enriched models.
arXiv Detail & Related papers (2022-12-19T11:26:23Z) - Entity Disambiguation with Entity Definitions [50.01142092276296]
Local models have recently attained astounding performances in Entity Disambiguation (ED)
Previous works limited their studies to using, as the textual representation of each candidate, only its Wikipedia title.
In this paper, we address this limitation and investigate to what extent more expressive textual representations can mitigate it.
We report a new state of the art on 2 out of 6 benchmarks we consider and strongly improve the generalization capability over unseen patterns.
arXiv Detail & Related papers (2022-10-11T17:46:28Z) - Document-Level Relation Extraction with Sentences Importance Estimation
and Focusing [52.069206266557266]
Document-level relation extraction (DocRE) aims to determine the relation between two entities from a document of multiple sentences.
We propose a Sentence Estimation and Focusing (SIEF) framework for DocRE, where we design a sentence importance score and a sentence focusing loss.
Experimental results on two domains show that our SIEF not only improves overall performance, but also makes DocRE models more robust.
arXiv Detail & Related papers (2022-04-27T03:20:07Z) - CABACE: Injecting Character Sequence Information and Domain Knowledge
for Enhanced Acronym and Long-Form Extraction [0.0]
We propose a novel framework CABACE: Character-Aware BERT for ACronym Extraction.
It takes into account character sequences in text and is adapted to scientific and legal domains by masked language modelling.
We show that the proposed framework is better suited than baseline models for zero-shot generalization to non-English languages.
arXiv Detail & Related papers (2021-12-25T14:03:09Z) - Transformer-Based Approach for Joint Handwriting and Named Entity
Recognition in Historical documents [1.7491858164568674]
This work presents the first approach that adopts the transformer networks for named entity recognition in handwritten documents.
We achieve the new state-of-the-art performance in the ICDAR 2017 Information Extraction competition using the Esposalles database.
arXiv Detail & Related papers (2021-12-08T09:26:21Z) - Knowledge-Aware Procedural Text Understanding with Multi-Stage Training [110.93934567725826]
We focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process.
Two challenges, the difficulty of commonsense reasoning and data insufficiency, still remain unsolved.
We propose a novel KnOwledge-Aware proceduraL text understAnding (KOALA) model, which effectively leverages multiple forms of external knowledge.
arXiv Detail & Related papers (2020-09-28T10:28:40Z) - Probing Linguistic Features of Sentence-Level Representations in Neural
Relation Extraction [80.38130122127882]
We introduce 14 probing tasks targeting linguistic properties relevant to neural relation extraction (RE)
We use them to study representations learned by more than 40 different encoder architecture and linguistic feature combinations trained on two datasets.
We find that the bias induced by the architecture and the inclusion of linguistic features are clearly expressed in the probing task performance.
arXiv Detail & Related papers (2020-04-17T09:17:40Z) - Structured Domain Adaptation with Online Relation Regularization for
Unsupervised Person Re-ID [62.90727103061876]
Unsupervised domain adaptation (UDA) aims at adapting the model trained on a labeled source-domain dataset to an unlabeled target-domain dataset.
We propose an end-to-end structured domain adaptation framework with an online relation-consistency regularization term.
Our proposed framework is shown to achieve state-of-the-art performance on multiple UDA tasks of person re-ID.
arXiv Detail & Related papers (2020-03-14T14:45:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.