MORTY: Structured Summarization for Targeted Information Extraction from
Scholarly Articles
- URL: http://arxiv.org/abs/2212.05429v1
- Date: Sun, 11 Dec 2022 06:49:29 GMT
- Title: MORTY: Structured Summarization for Targeted Information Extraction from
Scholarly Articles
- Authors: Mohamad Yaser Jaradeh, Markus Stocker, S\"oren Auer
- Abstract summary: We present MORTY, an information extraction technique that creates structured summaries of text from scholarly articles.
Our approach condenses the article's full-text to property-value pairs as a segmented text snippet called structured summary.
We also present a sizable scholarly dataset combining structured summaries retrieved from a scholarly knowledge graph and corresponding publicly available scientific articles.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Information extraction from scholarly articles is a challenging task due to
the sizable document length and implicit information hidden in text, figures,
and citations. Scholarly information extraction has various applications in
exploration, archival, and curation services for digital libraries and
knowledge management systems. We present MORTY, an information extraction
technique that creates structured summaries of text from scholarly articles.
Our approach condenses the article's full-text to property-value pairs as a
segmented text snippet called structured summary. We also present a sizable
scholarly dataset combining structured summaries retrieved from a scholarly
knowledge graph and corresponding publicly available scientific articles, which
we openly publish as a resource for the research community. Our results show
that structured summarization is a suitable approach for targeted information
extraction that complements other commonly used methods such as question
answering and named entity recognition.
Related papers
- SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents [49.54155332262579]
We release a new entity and relation extraction dataset for entities related to datasets, methods, and tasks in scientific articles.
Our dataset contains 106 manually annotated full-text scientific publications with over 24k entities and 12k relations.
arXiv Detail & Related papers (2024-10-28T15:56:49Z) - DiscoverPath: A Knowledge Refinement and Retrieval System for
Interdisciplinarity on Biomedical Research [96.10765714077208]
Traditional keyword-based search engines fall short in assisting users who may not be familiar with specific terminologies.
We present a knowledge graph-based paper search engine for biomedical research to enhance the user experience.
The system, dubbed DiscoverPath, employs Named Entity Recognition (NER) and part-of-speech (POS) tagging to extract terminologies and relationships from article abstracts to create a KG.
arXiv Detail & Related papers (2023-09-04T20:52:33Z) - QuOTeS: Query-Oriented Technical Summarization [0.2936007114555107]
We propose QuOTeS, an interactive system designed to retrieve sentences related to a summary of the research from a collection of potential references.
QuOTeS integrates techniques from Query-Focused Extractive Summarization and High-Recall Information Retrieval to provide Interactive Query-Focused Summarization of scientific documents.
The results show that QuOTeS provides a positive user experience and consistently provides query-focused summaries that are relevant, concise, and complete.
arXiv Detail & Related papers (2023-06-20T18:43:24Z) - Making Science Simple: Corpora for the Lay Summarisation of Scientific
Literature [21.440724685950443]
We present two novel lay summarisation datasets, PLOS (large-scale) and eLife (medium-scale)
We provide a thorough characterisation of our lay summaries, highlighting differing levels of readability and abstractiveness between datasets.
arXiv Detail & Related papers (2022-10-18T15:28:30Z) - TRIE++: Towards End-to-End Information Extraction from Visually Rich
Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents.
Text reading and information extraction can reinforce each other via a well-designed multi-modal context block.
The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Generating Knowledge Graphs by Employing Natural Language Processing and
Machine Learning Techniques within the Scholarly Domain [1.9004296236396943]
We present a new architecture that takes advantage of Natural Language Processing and Machine Learning methods for extracting entities and relationships from research publications.
Within this research work, we i) tackle the challenge of knowledge extraction by employing several state-of-the-art Natural Language Processing and Text Mining tools.
We generated a scientific knowledge graph including 109,105 triples, extracted from 26,827 abstracts of papers within the Semantic Web domain.
arXiv Detail & Related papers (2020-10-28T08:31:40Z) - TRIE: End-to-End Text Reading and Information Extraction for Document
Understanding [56.1416883796342]
We propose a unified end-to-end text reading and information extraction network.
multimodal visual and textual features of text reading are fused for information extraction.
Our proposed method significantly outperforms the state-of-the-art methods in both efficiency and accuracy.
arXiv Detail & Related papers (2020-05-27T01:47:26Z) - From Standard Summarization to New Tasks and Beyond: Summarization with
Manifold Information [77.89755281215079]
Text summarization is the research area aiming at creating a short and condensed version of the original document.
In real-world applications, most of the data is not in a plain text format.
This paper focuses on the survey of these new summarization tasks and approaches in the real-world application.
arXiv Detail & Related papers (2020-05-10T14:59:36Z) - Natural language processing for word sense disambiguation and
information extraction [0.0]
The thesis presents a new approach for Word Sense Disambiguation using thesaurus.
A Document Retrieval method, based on Fuzzy Logic has been described and its application is illustrated.
The strategy concludes with the presentation of a novel strategy based on Dempster-Shafer theory of evidential reasoning.
arXiv Detail & Related papers (2020-04-05T17:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.