Related papers: Enhancing Document Retrieval in COVID-19 Research: Leveraging Large Language Models for Hidden Relation Extraction

Enhancing Document Retrieval in COVID-19 Research: Leveraging Large Language Models for Hidden Relation Extraction

URL: http://arxiv.org/abs/2506.18311v1
Date: Mon, 23 Jun 2025 05:55:53 GMT
Title: Enhancing Document Retrieval in COVID-19 Research: Leveraging Large Language Models for Hidden Relation Extraction
Authors: Hoang-An Trieu, Dinh-Truong Do, Chau Nguyen, Vu Tran, Minh Le Nguyen,
Abstract summary: We present a method to help the retrieval system, the Covrelex-SE system, to provide more high-quality search results.<n>We exploited the power of the large language models (LLMs) to extract the hidden relationships inside the unlabeled publication.
Score: 1.8100383997044667
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, with the appearance of the COVID-19 pandemic, numerous publications relevant to this disease have been issued. Because of the massive volume of publications, an efficient retrieval system is necessary to provide researchers with useful information if an unexpected pandemic happens so suddenly, like COVID-19. In this work, we present a method to help the retrieval system, the Covrelex-SE system, to provide more high-quality search results. We exploited the power of the large language models (LLMs) to extract the hidden relationships inside the unlabeled publication that cannot be found by the current parsing tools that the system is using. Since then, help the system to have more useful information during retrieval progress.

Related papers

PICOs-RAG: PICO-supported Query Rewriting for Retrieval-Augmented Generation in Evidence-Based Medicine [18.902401214105875]
We present the PICOs-RAG to expand the user queries into a better format.<n>Our method can expand and normalize the queries into professional ones.<n>Thereby the PICOs-RAG improves the performance of the large language models into a helpful and reliable medical assistant.
arXiv Detail & Related papers (2025-10-28T02:01:05Z)
Query Decomposition for RAG: Balancing Exploration-Exploitation [83.79639293409802]
RAG systems address complex user requests by decomposing them into subqueries, retrieving potentially relevant documents for each, and then aggregating them to generate an answer.<n>We formulate query decomposition and document retrieval in an exploitation-exploration setting, where retrieving one document at a time builds a belief about the utility of a given sub-queries.<n>Our main finding is that estimating document relevance using rank information and human judgments yields a 35% gain in document-level precision, 15% increase in alpha-nDCG, and better performance on the downstream task of long-form generation.
arXiv Detail & Related papers (2025-10-21T13:37:11Z)
Biomedical Literature Q&A System Using Retrieval-Augmented Generation (RAG) [0.0]
This work presents a Biomedical Literature Question Answering (Q&A) system based on a Retrieval-Augmented Generation architecture.<n>The system integrates diverse sources, including PubMed articles, curated Q&A datasets, and medical encyclopedias.<n>The system supports both general medical queries and domain-specific tasks, with a focused evaluation on breast cancer literature.
arXiv Detail & Related papers (2025-09-05T21:29:52Z)
A Large-Scale Vision-Language Dataset Derived from Open Scientific Literature to Advance Biomedical Generalist AI [70.06771291117965]
We introduce Biomedica, an open-source dataset derived from the PubMed Central Open Access subset.<n>Biomedica contains over 6 million scientific articles and 24 million image-text pairs.<n>We provide scalable streaming and search APIs through a web server, facilitating seamless integration with AI systems.
arXiv Detail & Related papers (2025-03-26T05:56:46Z)
SeRTS: Self-Rewarding Tree Search for Biomedical Retrieval-Augmented Generation [50.26966969163348]
Large Language Models (LLMs) have shown great potential in the biomedical domain with the advancement of retrieval-augmented generation (RAG) Existing retrieval-augmented approaches face challenges in addressing diverse queries and documents, particularly for medical knowledge queries. We propose Self-Rewarding Tree Search (SeRTS) based on Monte Carlo Tree Search (MCTS) and a self-rewarding paradigm.
arXiv Detail & Related papers (2024-06-17T06:48:31Z)
DocReLM: Mastering Document Retrieval with Language Model [49.847369507694154]
We demonstrate that by utilizing large language models, a document retrieval system can achieve advanced semantic understanding capabilities. Our approach involves training the retriever and reranker using domain-specific data generated by large language models. We use a test set annotated by academic researchers in the fields of quantum physics and computer vision to evaluate our system's performance.
arXiv Detail & Related papers (2024-05-19T06:30:22Z)
An Information Retrieval and Extraction Tool for Covid-19 Related Papers [0.0]
The main focus of this paper is to provide researchers with a better search tool for COVID-19 related papers. Our tool has shown the potential to assist researchers by automating a topic-based search of CORD-19 papers.
arXiv Detail & Related papers (2024-01-20T01:34:50Z)
De-identification of clinical free text using natural language processing: A systematic review of current approaches [48.343430343213896]
Natural language processing has repeatedly demonstrated its feasibility in automating the de-identification process. Our study aims to provide systematic evidence on how the de-identification of clinical free text has evolved in the last thirteen years.
arXiv Detail & Related papers (2023-11-28T13:20:41Z)
COVID-19 Multidimensional Kaggle Literature Organization [3.201839066679614]
We show that factorization is a powerful unsupervised learning method capable of discovering hidden patterns in a document corpus. We show that a higher-order representation of the corpus allows for the simultaneous grouping of similar articles, relevant journals, authors with similar research interests, and topic keywords.
arXiv Detail & Related papers (2021-07-17T06:16:36Z)
COVID-SEE: Scientific Evidence Explorer for COVID-19 Related Research [29.209304525218013]
COVID-SEE is a system for medical literature discovery based on the concept of information exploration. It builds on several distinct text analysis and natural language processing methods to structure and organise information in publications.
arXiv Detail & Related papers (2020-08-18T12:14:36Z)
Automatic Text Summarization of COVID-19 Medical Research Articles using BERT and GPT-2 [8.223517872575712]
We take advantage of the recent advances in pre-trained NLP models, BERT and OpenAI GPT-2. Our model provides abstractive and comprehensive information based on keywords extracted from the original articles. Our work can help the the medical community, by providing succinct summaries of articles for which the abstract are not already available.
arXiv Detail & Related papers (2020-06-03T00:54:44Z)
Visualising COVID-19 Research [4.664989082015335]
We develop a novel automated theme-based visualisation method. It combines advanced data modelling of large corpora, information mapping and trend analysis. It provides a top-down and bottom-up browsing and search interface for quick discovery of topics and research resources.
arXiv Detail & Related papers (2020-05-13T15:45:14Z)
CAiRE-COVID: A Question Answering and Query-focused Multi-Document Summarization System for COVID-19 Scholarly Information Management [48.251211691263514]
We present CAiRE-COVID, a real-time question answering (QA) and multi-document summarization system, which won one of the 10 tasks in the Kaggle COVID-19 Open Research dataset Challenge. Our system aims to tackle the recent challenge of mining the numerous scientific articles being published on COVID-19 by answering high priority questions from the community.
arXiv Detail & Related papers (2020-05-04T15:07:27Z)
Opportunities and Challenges of Deep Learning Methods for Electrocardiogram Data: A Systematic Review [62.490310870300746]
The electrocardiogram (ECG) is one of the most commonly used diagnostic tools in medicine and healthcare. Deep learning methods have achieved promising results on predictive healthcare tasks using ECG signals. This paper presents a systematic review of deep learning methods for ECG data from both modeling and application perspectives.
arXiv Detail & Related papers (2019-12-28T02:44:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.