From Keywords to Structured Summaries: Streamlining Scholarly Information Access
- URL: http://arxiv.org/abs/2402.14622v2
- Date: Wed, 23 Oct 2024 10:06:10 GMT
- Title: From Keywords to Structured Summaries: Streamlining Scholarly Information Access
- Authors: Mahsa Shamsabadi, Jennifer D'Souza,
- Abstract summary: This paper highlights the growing importance of information retrieval (IR) engines in the scientific community.
It addresses the inefficiency of traditional keyword-based search engines due to the rising volume of publications.
The proposed solution involves structured records, underpinning advanced information technology (IT) tools, including visualization dashboards.
- Score: 0.0
- License:
- Abstract: This paper highlights the growing importance of information retrieval (IR) engines in the scientific community, addressing the inefficiency of traditional keyword-based search engines due to the rising volume of publications. The proposed solution involves structured records, underpinning advanced information technology (IT) tools, including visualization dashboards, to revolutionize how researchers access and filter articles, replacing the traditional text-heavy approach. This vision is exemplified through a proof of concept centered on the "reproductive number estimate of infectious diseases" research theme, using a fine-tuned large language model (LLM) to automate the creation of structured records to populate a backend database that now goes beyond keywords. The result is a next-generation information access system as an IR method accessible at https://orkg.org/usecases/r0-estimates.
Related papers
- Automated Extraction and Maturity Analysis of Open Source Clinical Informatics Repositories from Scientific Literature [0.0]
This study introduces an automated methodology to bridge the gap by systematically extracting GitHub repository URLs from academic papers indexed in arXiv.
Our approach encompasses querying the arXiv API for relevant papers, cleaning extracted GitHub URLs, fetching comprehensive repository information via the GitHub API, and analyzing repository maturity based on defined metrics such as stars, forks, open issues, and contributors.
arXiv Detail & Related papers (2024-03-20T17:06:51Z) - Navigating the Knowledge Sea: Planet-scale answer retrieval using LLMs [0.0]
Information retrieval is characterized by a continuous refinement of techniques and technologies.
This paper focuses on the role of Large Language Models (LLMs) in bridging the gap between traditional search methods and the emerging paradigm of answer retrieval.
arXiv Detail & Related papers (2024-02-07T23:39:40Z) - DiscoverPath: A Knowledge Refinement and Retrieval System for
Interdisciplinarity on Biomedical Research [96.10765714077208]
Traditional keyword-based search engines fall short in assisting users who may not be familiar with specific terminologies.
We present a knowledge graph-based paper search engine for biomedical research to enhance the user experience.
The system, dubbed DiscoverPath, employs Named Entity Recognition (NER) and part-of-speech (POS) tagging to extract terminologies and relationships from article abstracts to create a KG.
arXiv Detail & Related papers (2023-09-04T20:52:33Z) - An approach based on Open Research Knowledge Graph for Knowledge
Acquisition from scientific papers [4.8951183832371]
Open Research Knowledge Graph (ORKG) is a computer-assisted tool to organize key-insights extracted from research papers.
It is currently used to document "food information engineering", "Tabular data to Knowledge Graph Matching" and "Question Answering" research problems and "Neuro-symbolic AI" domain.
arXiv Detail & Related papers (2023-08-23T20:05:42Z) - Deep learning for table detection and structure recognition: A survey [49.09628624903334]
The goal of this survey is to provide a profound comprehension of the major developments in the field of Table Detection.
We provide an analysis of both classic and new applications in the field.
The datasets and source code of the existing models are organized to provide the reader with a compass on this vast literature.
arXiv Detail & Related papers (2022-11-15T19:42:27Z) - CorpusBrain: Pre-train a Generative Retrieval Model for
Knowledge-Intensive Language Tasks [62.22920673080208]
Single-step generative model can dramatically simplify the search process and be optimized in end-to-end manner.
We name the pre-trained generative retrieval model as CorpusBrain as all information about the corpus is encoded in its parameters without the need of constructing additional index.
arXiv Detail & Related papers (2022-08-16T10:22:49Z) - Retrieval-Enhanced Machine Learning [110.5237983180089]
We describe a generic retrieval-enhanced machine learning framework, which includes a number of existing models as special cases.
REML challenges information retrieval conventions, presenting opportunities for novel advances in core areas, including optimization.
REML research agenda lays a foundation for a new style of information access research and paves a path towards advancing machine learning and artificial intelligence.
arXiv Detail & Related papers (2022-05-02T21:42:45Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - Enhancing Keyphrase Extraction from Academic Articles with their
Reference Information [12.769066804715697]
Keyphrases that summarize document information highly are helpful for users to quickly obtain and understand documents.
Title information in references also contains author-assigned keyphrases.
Experiments show reference information can increase precision, recall, and F1 of automatic keyphrase extraction.
arXiv Detail & Related papers (2021-11-28T11:14:16Z) - A Meta-embedding-based Ensemble Approach for ICD Coding Prediction [64.42386426730695]
International Classification of Diseases (ICD) are the de facto codes used globally for clinical coding.
These codes enable healthcare providers to claim reimbursement and facilitate efficient storage and retrieval of diagnostic information.
Our proposed approach enhances the performance of neural models by effectively training word vectors using routine medical data as well as external knowledge from scientific articles.
arXiv Detail & Related papers (2021-02-26T17:49:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.