Pattern-based Acquisition of Scientific Entities from Scholarly Article
Titles
- URL: http://arxiv.org/abs/2109.00199v1
- Date: Wed, 1 Sep 2021 05:59:06 GMT
- Title: Pattern-based Acquisition of Scientific Entities from Scholarly Article
Titles
- Authors: Jennifer D'Souza and Soeren Auer
- Abstract summary: We describe a rule-based approach for the automatic acquisition of scientific entities from scholarly article titles.
We identify a set of lexico-syntactic patterns that are easily recognizable.
A subset of the acquisition algorithm is implemented for article titles in the Computational Linguistics (CL) scholarly domain.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We describe a rule-based approach for the automatic acquisition of scientific
entities from scholarly article titles. Two observations motivated the
approach: (i) noting the concentration of an article's contribution information
in its title; and (ii) capturing information pattern regularities via a system
of rules that alleviate the human annotation task in creating gold standards
that annotate single instances at a time. We identify a set of lexico-syntactic
patterns that are easily recognizable, that occur frequently, and that
generally indicates the scientific entity type of interest about the scholarly
contribution.
A subset of the acquisition algorithm is implemented for article titles in
the Computational Linguistics (CL) scholarly domain. The tool called
ORKG-Title-Parser, in its first release, identifies the following six concept
types of scientific terminology from the CL paper titles, viz. research
problem, solution, resource, language, tool, and method. It has been
empirically evaluated on a collection of 50,237 titles that cover nearly all
articles in the ACL Anthology. It has extracted 19,799 research problems;
18,111 solutions; 20,033 resources; 1,059 languages; 6,878 tools; and 21,687
methods at an average extraction precision of 75%. The code and related data
resources are publicly available at
https://gitlab.com/TIBHannover/orkg/orkg-title-parser.
Finally, in the article, we discuss extensions and applications to areas such
as scholarly knowledge graph (SKG) creation.
Related papers
- SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks.
SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z) - Object Recognition from Scientific Document based on Compartment Refinement Framework [2.699900017799093]
It has become increasingly important to extract valuable information from vast resources efficiently.
Current data extraction methods for scientific documents typically use rule-based (RB) or machine learning (ML) approaches.
We propose a new document layout analysis framework called CTBR(Compartment & Text Blocks Refinement)
arXiv Detail & Related papers (2023-12-14T15:36:49Z) - DiscoverPath: A Knowledge Refinement and Retrieval System for
Interdisciplinarity on Biomedical Research [96.10765714077208]
Traditional keyword-based search engines fall short in assisting users who may not be familiar with specific terminologies.
We present a knowledge graph-based paper search engine for biomedical research to enhance the user experience.
The system, dubbed DiscoverPath, employs Named Entity Recognition (NER) and part-of-speech (POS) tagging to extract terminologies and relationships from article abstracts to create a KG.
arXiv Detail & Related papers (2023-09-04T20:52:33Z) - A Framework For Refining Text Classification and Object Recognition from Academic Articles [2.699900017799093]
Current data mining methods for academic articles employ rule-based(RB) or machine learning(ML) approaches.
We have developed a novel Text Block Refinement Framework (TBRF), a machine learning and rule-based scheme hybrid.
arXiv Detail & Related papers (2023-05-27T07:59:49Z) - MORTY: Structured Summarization for Targeted Information Extraction from
Scholarly Articles [0.0]
We present MORTY, an information extraction technique that creates structured summaries of text from scholarly articles.
Our approach condenses the article's full-text to property-value pairs as a segmented text snippet called structured summary.
We also present a sizable scholarly dataset combining structured summaries retrieved from a scholarly knowledge graph and corresponding publicly available scientific articles.
arXiv Detail & Related papers (2022-12-11T06:49:29Z) - arXivEdits: Understanding the Human Revision Process in Scientific
Writing [17.63505461444103]
We provide a complete computational framework for studying text revision in scientific writing.
We first introduce arXivEdits, a new annotated corpus of 751 full papers from arXiv with gold sentence alignment across their multiple versions of revision.
It supports our data-driven analysis to unveil the common strategies practiced by researchers for revising their papers.
arXiv Detail & Related papers (2022-10-26T22:50:24Z) - LDKP: A Dataset for Identifying Keyphrases from Long Scientific
Documents [48.84086818702328]
Identifying keyphrases (KPs) from text documents is a fundamental task in natural language processing and information retrieval.
Vast majority of the benchmark datasets for this task are from the scientific domain containing only the document title and abstract information.
This presents three challenges for real-world applications: human-written summaries are unavailable for most documents, the documents are almost always long, and a high percentage of KPs are directly found beyond the limited context of title and abstract.
arXiv Detail & Related papers (2022-03-29T08:44:57Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z) - Topic-Centric Unsupervised Multi-Document Summarization of Scientific
and News Articles [3.0504782036247438]
We propose a topic-centric unsupervised multi-document summarization framework to generate abstractive summaries.
The proposed algorithm generates an abstractive summary by developing salient language unit selection and text generation techniques.
Our approach matches the state-of-the-art when evaluated on automated extractive evaluation metrics and performs better for abstractive summarization on five human evaluation metrics.
arXiv Detail & Related papers (2020-11-03T04:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.