A system for information extraction from scientific texts in Russian
- URL: http://arxiv.org/abs/2109.06703v1
- Date: Tue, 14 Sep 2021 14:08:37 GMT
- Title: A system for information extraction from scientific texts in Russian
- Authors: Elena Bruches, Anastasia Mezentseva, Tatiana Batura
- Abstract summary: The system performs several tasks in an end-to-end manner: term recognition, extraction of relations between terms, and term linking with entities from the knowledge base.
The advantage of the implemented methods is that the system does not require a large amount of labeled data, which saves time and effort for data labeling.
The source code is publicly available and can be used for different research purposes.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present a system for information extraction from scientific
texts in the Russian language. The system performs several tasks in an
end-to-end manner: term recognition, extraction of relations between terms, and
term linking with entities from the knowledge base. These tasks are extremely
important for information retrieval, recommendation systems, and
classification. The advantage of the implemented methods is that the system
does not require a large amount of labeled data, which saves time and effort
for data labeling and therefore can be applied in low- and mid-resource
settings. The source code is publicly available and can be used for different
research purposes.
Related papers
- DiscoverPath: A Knowledge Refinement and Retrieval System for
Interdisciplinarity on Biomedical Research [96.10765714077208]
Traditional keyword-based search engines fall short in assisting users who may not be familiar with specific terminologies.
We present a knowledge graph-based paper search engine for biomedical research to enhance the user experience.
The system, dubbed DiscoverPath, employs Named Entity Recognition (NER) and part-of-speech (POS) tagging to extract terminologies and relationships from article abstracts to create a KG.
arXiv Detail & Related papers (2023-09-04T20:52:33Z) - CQE: A Comprehensive Quantity Extractor [2.2079886535603084]
We present a comprehensive quantity extraction framework from text data.
It efficiently detects combinations of values and units, the behavior of a quantity, and the concept a quantity is associated with.
Our framework makes use of dependency parsing and a dictionary of units, and it provides for a proper normalization and standardization of detected quantities.
arXiv Detail & Related papers (2023-05-15T17:59:41Z) - TERMinator: A system for scientific texts processing [0.0]
This paper is devoted to the extraction of entities and semantic relations between them from scientific texts.
We present a dataset that includes annotations for two tasks and develop a system called TERMinator for the study of the influence of language models on term recognition.
arXiv Detail & Related papers (2022-09-29T15:14:42Z) - Layout-Aware Information Extraction for Document-Grounded Dialogue:
Dataset, Method and Demonstration [75.47708732473586]
We propose a layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents.
LIE contains 62k annotations of three extraction tasks from 4,061 pages in product and official documents.
Empirical results show that layout is critical for VRD-based extraction, and system demonstration also verifies that the extracted knowledge can help locate the answers that users care about.
arXiv Detail & Related papers (2022-07-14T07:59:45Z) - DAMO-NLP at SemEval-2022 Task 11: A Knowledge-based System for
Multilingual Named Entity Recognition [94.1865071914727]
MultiCoNER aims at detecting semantically ambiguous named entities in short and low-context settings for multiple languages.
Our team DAMO-NLP proposes a knowledge-based system, where we build a multilingual knowledge base based on Wikipedia.
Given an input sentence, our system effectively retrieves related contexts from the knowledge base.
Our system wins 10 out of 13 tracks in the MultiCoNER shared task.
arXiv Detail & Related papers (2022-03-01T15:29:35Z) - Addressing Issues of Cross-Linguality in Open-Retrieval Question
Answering Systems For Emergent Domains [67.99403521976058]
We demonstrate a cross-lingual open-retrieval question answering system for the emergent domain of COVID-19.
Our system adopts a corpus of scientific articles to ensure that retrieved documents are reliable.
We show that a deep semantic retriever greatly benefits from training on our English-to-all data and significantly outperforms a BM25 baseline in the cross-lingual setting.
arXiv Detail & Related papers (2022-01-26T19:27:32Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - Entity Recognition and Relation Extraction from Scientific and Technical
Texts in Russian [0.0]
This paper is devoted to the study of methods for information extraction from scientific texts on information technology.
Several modifications of methods for the Russian language are proposed.
It also includes the results of experiments comparing a keyword extraction method, vocabulary method, and some methods based on neural networks.
arXiv Detail & Related papers (2020-11-19T13:40:03Z) - Natural language processing for word sense disambiguation and
information extraction [0.0]
The thesis presents a new approach for Word Sense Disambiguation using thesaurus.
A Document Retrieval method, based on Fuzzy Logic has been described and its application is illustrated.
The strategy concludes with the presentation of a novel strategy based on Dempster-Shafer theory of evidential reasoning.
arXiv Detail & Related papers (2020-04-05T17:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.