An Information Retrieval and Extraction Tool for Covid-19 Related Papers
- URL: http://arxiv.org/abs/2401.16430v1
- Date: Sat, 20 Jan 2024 01:34:50 GMT
- Title: An Information Retrieval and Extraction Tool for Covid-19 Related Papers
- Authors: Marcos V. L. Pivetta
- Abstract summary: The main focus of this paper is to provide researchers with a better search tool for COVID-19 related papers.
Our tool has shown the potential to assist researchers by automating a topic-based search of CORD-19 papers.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Background: The COVID-19 pandemic has caused severe impacts on health systems
worldwide. Its critical nature and the increased interest of individuals and
organizations to develop countermeasures to the problem has led to a surge of
new studies in scientific journals. Objetive: We sought to develop a tool that
incorporates, in a novel way, aspects of Information Retrieval (IR) and
Extraction (IE) applied to the COVID-19 Open Research Dataset (CORD-19). The
main focus of this paper is to provide researchers with a better search tool
for COVID-19 related papers, helping them find reference papers and hightlight
relevant entities in text. Method: We applied Latent Dirichlet Allocation (LDA)
to model, based on research aspects, the topics of all English abstracts in
CORD-19. Relevant named entities of each abstract were extracted and linked to
the corresponding UMLS concept. Regular expressions and the K-Nearest Neighbors
algorithm were used to rank relevant papers. Results: Our tool has shown the
potential to assist researchers by automating a topic-based search of CORD-19
papers. Nonetheless, we identified that more fine-tuned topic modeling
parameters and increased accuracy of the research aspect classifier model could
lead to a more accurate and reliable tool. Conclusion: We emphasize the need of
new automated tools to help researchers find relevant COVID-19 documents, in
addition to automatically extracting useful information contained in them. Our
work suggests that combining different algorithms and models could lead to new
ways of browsing COVID-19 paper data.
Related papers
- SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents [49.54155332262579]
We release a new entity and relation extraction dataset for entities related to datasets, methods, and tasks in scientific articles.
Our dataset contains 106 manually annotated full-text scientific publications with over 24k entities and 12k relations.
arXiv Detail & Related papers (2024-10-28T15:56:49Z) - Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation [51.2289822267563]
We propose Corpus Retrieval and Augmentation for Fine-Tuning (CRAFT), a method for generating synthetic datasets.
We use large-scale public web-crawled corpora and similarity-based document retrieval to find other relevant human-written documents.
We demonstrate that CRAFT can efficiently generate large-scale task-specific training datasets for four diverse tasks.
arXiv Detail & Related papers (2024-09-03T17:54:40Z) - SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks.
SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z) - Exploring the evolution of research topics during the COVID-19 pandemic [3.234641429290768]
We present the CORD-19 Topic Visualizer (CORToViz), a method and associated visualization tool for inspecting the CORD-19 textual corpus of scientific abstracts.
Our method is based upon a careful selection of up-to-date technologies (including large language models) and extraction techniques for temporal topic mining.
Topic inspection is supported by an interactive dashboard, providing fast, one-click visualization of topic contents as word clouds and topic trends as time series.
arXiv Detail & Related papers (2023-10-05T22:16:41Z) - An approach based on Open Research Knowledge Graph for Knowledge
Acquisition from scientific papers [4.8951183832371]
Open Research Knowledge Graph (ORKG) is a computer-assisted tool to organize key-insights extracted from research papers.
It is currently used to document "food information engineering", "Tabular data to Knowledge Graph Matching" and "Question Answering" research problems and "Neuro-symbolic AI" domain.
arXiv Detail & Related papers (2023-08-23T20:05:42Z) - Good Data, Large Data, or No Data? Comparing Three Approaches in
Developing Research Aspect Classifiers for Biomedical Papers [19.1408856831043]
We investigate the impact of different datasets on model performance for the crowd-annotated CODA-19 research aspect classification task.
Our results indicate that using the PubMed 200K RCT dataset does not improve performance for the CODA-19 task.
arXiv Detail & Related papers (2023-06-07T22:56:53Z) - COVID-19 Multidimensional Kaggle Literature Organization [3.201839066679614]
We show that factorization is a powerful unsupervised learning method capable of discovering hidden patterns in a document corpus.
We show that a higher-order representation of the corpus allows for the simultaneous grouping of similar articles, relevant journals, authors with similar research interests, and topic keywords.
arXiv Detail & Related papers (2021-07-17T06:16:36Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Extracting a Knowledge Base of Mechanisms from COVID-19 Papers [50.17242035034729]
We pursue the construction of a knowledge base (KB) of mechanisms.
We develop a broad, unified schema that strikes a balance between relevance and breadth.
Experiments demonstrate the utility of our KB in supporting interdisciplinary scientific search over COVID-19 literature.
arXiv Detail & Related papers (2020-10-08T07:54:14Z) - COVID-19 Knowledge Graph: Accelerating Information Retrieval and
Discovery for Scientific Literature [23.279540233851993]
coronavirus disease (COVID-19) has claimed the lives of over 350,000 people and infected more than 6 million people worldwide.
Several search engines have surfaced to provide researchers with additional tools to find and retrieve information from the rapidly growing corpora on COVID-19.
We present the COVID-19 Knowledge Graph (CKG), a heterogeneous graph for extracting and visualizing complex relationships between COVID-19 articles.
arXiv Detail & Related papers (2020-07-24T18:29:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.