Related papers: Ontology-Guided Query Expansion for Biomedical Document Retrieval using Large Language Models

Ontology-Guided Query Expansion for Biomedical Document Retrieval using Large Language Models

URL: http://arxiv.org/abs/2508.11784v1
Date: Fri, 15 Aug 2025 19:23:26 GMT
Title: Ontology-Guided Query Expansion for Biomedical Document Retrieval using Large Language Models
Authors: Zabir Al Nazi, Vagelis Hristidis, Aaron Lawson McLean, Jannat Ara Meem, Md Taukir Azam Chowdhury,
Abstract summary: BMQExpander is a novel query expansion pipeline that combines medical knowledge - definitions and relationships - from the UMLS Metathesaurus with the generative capabilities of large language models (LLMs) to enhance retrieval effectiveness.<n>We show that BMQExpander has superior retrieval performance on three popular biomedical Information Retrieval (IR) benchmarks.
Score: 2.4897806364302633
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Effective Question Answering (QA) on large biomedical document collections requires effective document retrieval techniques. The latter remains a challenging task due to the domain-specific vocabulary and semantic ambiguity in user queries. We propose BMQExpander, a novel ontology-aware query expansion pipeline that combines medical knowledge - definitions and relationships - from the UMLS Metathesaurus with the generative capabilities of large language models (LLMs) to enhance retrieval effectiveness. We implemented several state-of-the-art baselines, including sparse and dense retrievers, query expansion methods, and biomedical-specific solutions. We show that BMQExpander has superior retrieval performance on three popular biomedical Information Retrieval (IR) benchmarks: NFCorpus, TREC-COVID, and SciFact - with improvements of up to 22.1% in NDCG@10 over sparse baselines and up to 6.5% over the strongest baseline. Further, BMQExpander generalizes robustly under query perturbation settings, in contrast to supervised baselines, achieving up to 15.7% improvement over the strongest baseline. As a side contribution, we publish our paraphrased benchmarks. Finally, our qualitative analysis shows that BMQExpander has fewer hallucinations compared to other LLM-based query expansion baselines.

Related papers

SAGE: Benchmarking and Improving Retrieval for Deep Research Agents [60.53966065867568]
We introduce SAGE, a benchmark for scientific literature retrieval comprising 1,200 queries across four scientific domains, with a 200,000 paper retrieval corpus.<n>We evaluate six deep research agents and find that all systems struggle with reasoning-intensive retrieval.<n> BM25 significantly outperforms LLM-based retrievers by approximately 30%, as existing agents generate keyword-oriented sub-queries.
arXiv Detail & Related papers (2026-02-05T18:25:24Z)
SpIDER: Spatially Informed Dense Embedding Retrieval for Software Issue Localization [6.098008057625392]
Agentic approaches typically employ sparse retrieval methods like BM25 or dense embedding strategies to identify relevant units.<n>We propose SpIDER (Spatially Informed Dense Embedding Retrieval), an enhanced dense retrieval approach that incorporates LLM-based reasoning over auxiliary context.<n> Empirical results show that SpIDER consistently improves dense retrieval performance across several programming languages.
arXiv Detail & Related papers (2025-12-18T01:32:25Z)
MedBioRAG: Semantic Search and Retrieval-Augmented Generation with Large Language Models for Medical and Biological QA [0.0]
MedBioRAG is a retrieval-augmented model designed to improve biomedical question-answering (QA) performance.<n>We evaluate MedBioRAG across text retrieval, close-ended QA, and long-form QA tasks using benchmark datasets.
arXiv Detail & Related papers (2025-12-10T15:43:25Z)
YpathRAG:A Retrieval-Augmented Generation Framework and Benchmark for Pathology [16.03995342015096]
We build a pathology vector database covering 28 subfields and 1.53 million paragraphs.<n>We present YpathRAG, a pathology-oriented RAG framework with dual-channel hybrid retrieval.<n>We also release two evaluation benchmarks, YpathR and YpathQA-M.
arXiv Detail & Related papers (2025-10-07T08:47:59Z)
Eigen-1: Adaptive Multi-Agent Refinement with Monitor-Based RAG for Scientific Reasoning [53.45095336430027]
We develop a unified framework that combines implicit retrieval and structured collaboration.<n>On Humanity's Last Exam (HLE) Bio/Chem Gold, our framework achieves 48.3% accuracy.<n>Results on SuperGPQA and TRQA confirm robustness across domains.
arXiv Detail & Related papers (2025-09-25T14:05:55Z)
Reasoning-enhanced Query Understanding through Decomposition and Interpretation [87.56450566014625]
ReDI is a Reasoning-enhanced approach for query understanding through Decomposition and Interpretation.<n>We compiled a large-scale dataset of real-world complex queries from a major search engine.<n> Experiments on BRIGHT and BEIR demonstrate that ReDI consistently surpasses strong baselines in both sparse and dense retrieval paradigms.
arXiv Detail & Related papers (2025-09-08T10:58:42Z)
Biomedical Literature Q&A System Using Retrieval-Augmented Generation (RAG) [0.0]
This work presents a Biomedical Literature Question Answering (Q&A) system based on a Retrieval-Augmented Generation architecture.<n>The system integrates diverse sources, including PubMed articles, curated Q&A datasets, and medical encyclopedias.<n>The system supports both general medical queries and domain-specific tasks, with a focused evaluation on breast cancer literature.
arXiv Detail & Related papers (2025-09-05T21:29:52Z)
MedGemma Technical Report [75.88152277443179]
We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B.<n>MedGemma demonstrates advanced medical understanding and reasoning on images and text.<n>We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP.
arXiv Detail & Related papers (2025-07-07T17:01:44Z)
CliniQ: A Multi-faceted Benchmark for Electronic Health Record Retrieval with Semantic Match Assessment [11.815222175336695]
We introduce a novel public EHR retrieval benchmark, CliniQ, to address this gap.<n>We build our benchmark upon 1,000 discharge summary notes along with the ICD codes and prescription labels from MIMIC-III.<n>We conduct a comprehensive evaluation of various retrieval methods, ranging from conventional exact match to popular dense retrievers.
arXiv Detail & Related papers (2025-02-10T08:33:47Z)
SeRTS: Self-Rewarding Tree Search for Biomedical Retrieval-Augmented Generation [50.26966969163348]
Large Language Models (LLMs) have shown great potential in the biomedical domain with the advancement of retrieval-augmented generation (RAG) Existing retrieval-augmented approaches face challenges in addressing diverse queries and documents, particularly for medical knowledge queries. We propose Self-Rewarding Tree Search (SeRTS) based on Monte Carlo Tree Search (MCTS) and a self-rewarding paradigm.
arXiv Detail & Related papers (2024-06-17T06:48:31Z)
Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models [18.984165679347026]
Self-BioRAG is a framework reliable for biomedical text that specializes in generating explanations, retrieving domain-specific documents, and self-reflecting generated responses. We utilize 84k filtered biomedical instruction sets to train Self-BioRAG that can assess its generated explanations with customized reflective tokens.
arXiv Detail & Related papers (2024-01-27T02:29:42Z)
Query2doc: Query Expansion with Large Language Models [69.9707552694766]
The proposed method first generates pseudo- documents by few-shot prompting large language models (LLMs) query2doc boosts the performance of BM25 by 3% to 15% on ad-hoc IR datasets. Our method also benefits state-of-the-art dense retrievers in terms of both in-domain and out-of-domain results.
arXiv Detail & Related papers (2023-03-14T07:27:30Z)
Query Expansion Using Contextual Clue Sampling with Language Models [69.51976926838232]
We propose a combination of an effective filtering strategy and fusion of the retrieved documents based on the generation probability of each context. Our lexical matching based approach achieves a similar top-5/top-20 retrieval accuracy and higher top-100 accuracy compared with the well-established dense retrieval model DPR. For end-to-end QA, the reader model also benefits from our method and achieves the highest Exact-Match score against several competitive baselines.
arXiv Detail & Related papers (2022-10-13T15:18:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.