Beyond Retrieval: Ensembling Cross-Encoders and GPT Rerankers with LLMs for Biomedical QA
- URL: http://arxiv.org/abs/2507.05577v1
- Date: Tue, 08 Jul 2025 01:25:06 GMT
- Title: Beyond Retrieval: Ensembling Cross-Encoders and GPT Rerankers with LLMs for Biomedical QA
- Authors: Shashank Verma, Fengyi Jiang, Xiangning Xue,
- Abstract summary: This paper presents the methodologies and results from our participation in the BioASQ 2025 Task13b Challenge.<n>We built a Retrieval-Augmented Generation (RAG) system that can answer biomedical questions by retrieving relevant PubMed documents and snippets to generate answers.<n>Our solution achieved an MAP@10 of 0.1581, placing 10th on the leaderboard for the retrieval task.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Biomedical semantic question answering rooted in information retrieval can play a crucial role in keeping up to date with vast, rapidly evolving and ever-growing biomedical literature. A robust system can help researchers, healthcare professionals and even layman users access relevant knowledge grounded in evidence. The BioASQ 2025 Task13b Challenge serves as an important benchmark, offering a competitive platform for advancement of this space. This paper presents the methodologies and results from our participation in this challenge where we built a Retrieval-Augmented Generation (RAG) system that can answer biomedical questions by retrieving relevant PubMed documents and snippets to generate answers. For the retrieval task, we generated dense embeddings from biomedical articles for initial retrieval, and applied an ensemble of finetuned cross-encoders and large language models (LLMs) for re-ranking to identify top relevant documents. Our solution achieved an MAP@10 of 0.1581, placing 10th on the leaderboard for the retrieval task. For answer generation, we employed few-shot prompting of instruction-tuned LLMs. Our system achieved macro-F1 score of 0.95 for yes/no questions (rank 12), Mean Reciprocal Rank (MRR) of 0.64 for factoid questions (rank 1), mean-F1 score of 0.63 for list questions (rank 5), and ROUGE-SU4 F1 score of 0.29 for ideal answers (rank 11).
Related papers
- Harnessing Collective Intelligence of LLMs for Robust Biomedical QA: A Multi-Model Approach [44.035446389573345]
We present our participation in the 13th edition of the BioASQ challenge, which involves biomedical semantic question-answering.<n>We deploy a selection of open-source large language models (LLMs) as retrieval-augmented generators to answer biomedical questions.<n>We evaluated 13 state-of-the-art open source LLMs, exploring all possible model combinations to contribute to the final answer.
arXiv Detail & Related papers (2025-08-02T20:20:08Z) - FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents [53.5649975411777]
We introduce FreshStack, a holistic framework for automatically building information retrieval (IR) evaluation benchmarks.<n>FreshStack conducts the following steps: automatic corpus collection from code and technical documentation, nugget generation from community-asked questions and answers, and nugget-level support.<n>We use FreshStack to build five datasets on fast-growing, recent, and niche topics to ensure the tasks are sufficiently challenging.
arXiv Detail & Related papers (2025-04-17T17:44:06Z) - AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels [19.90354530235266]
We introduce textbfSelf-textbfLearning textbfHypothetical textbfDocument textbfEmbeddings (textbfSL-HyDE) to tackle this issue.<n>SL-HyDE leverages large language models (LLMs) as generators to generate hypothetical documents based on a given query.<n>We present the Chinese Medical Information Retrieval Benchmark (CMIRB), a comprehensive evaluation framework grounded in real-world medical scenarios.
arXiv Detail & Related papers (2024-10-26T02:53:20Z) - Using Pretrained Large Language Model with Prompt Engineering to Answer Biomedical Questions [1.0742675209112622]
We propose a two-level information retrieval and question-answering system based on pre-trained large language models (LLM)
We construct prompts with in-context few-shot examples and utilize post-processing techniques like resampling and malformed response detection.
Our best-performing system achieved 0.14 MAP score on document retrieval, 0.05 MAP score on snippet retrieval, 0.96 F1 score for yes/no questions, 0.38 MRR score for factoid questions and 0.50 F1 score for list questions in Task 12b.
arXiv Detail & Related papers (2024-07-09T11:48:49Z) - SeRTS: Self-Rewarding Tree Search for Biomedical Retrieval-Augmented Generation [50.26966969163348]
Large Language Models (LLMs) have shown great potential in the biomedical domain with the advancement of retrieval-augmented generation (RAG)
Existing retrieval-augmented approaches face challenges in addressing diverse queries and documents, particularly for medical knowledge queries.
We propose Self-Rewarding Tree Search (SeRTS) based on Monte Carlo Tree Search (MCTS) and a self-rewarding paradigm.
arXiv Detail & Related papers (2024-06-17T06:48:31Z) - BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers [48.21255861863282]
BMRetriever is a series of dense retrievers for enhancing biomedical retrieval.
BMRetriever exhibits strong parameter efficiency, with the 410M variant outperforming baselines up to 11.7 times larger.
arXiv Detail & Related papers (2024-04-29T05:40:08Z) - Augmenting Black-box LLMs with Medical Textbooks for Biomedical Question Answering [48.17095875619711]
We present a system called LLMs Augmented with Medical Textbooks (LLM-AMT)<n>LLM-AMT integrates authoritative medical textbooks into the LLMs' framework using plug-and-play modules.<n>We found that medical textbooks as a retrieval corpus is proven to be a more effective knowledge database than Wikipedia in the medical domain.
arXiv Detail & Related papers (2023-09-05T13:39:38Z) - Top K Relevant Passage Retrieval for Biomedical Question Answering [1.0636004442689055]
Question answering is a task that answers factoid questions using a large collection of documents.
The existing Dense Passage Retrieval model has been trained on Wikipedia dump from Dec. 20, 2018, as the source documents for answering questions.
In this work, we work on the existing DPR framework for the biomedical domain and retrieve answers from the Pubmed articles which is a reliable source to answer medical questions.
arXiv Detail & Related papers (2023-08-08T04:06:11Z) - Medical Question Understanding and Answering with Knowledge Grounding
and Semantic Self-Supervision [53.692793122749414]
We introduce a medical question understanding and answering system with knowledge grounding and semantic self-supervision.
Our system is a pipeline that first summarizes a long, medical, user-written question, using a supervised summarization loss.
The system first matches the summarized user question with an FAQ from a trusted medical knowledge base, and then retrieves a fixed number of relevant sentences from the corresponding answer document.
arXiv Detail & Related papers (2022-09-30T08:20:32Z) - Query-focused Extractive Summarisation for Biomedical and COVID-19
Complex Question Answering [0.0]
This paper presents Macquarie University's participation in the two most recent BioASQ Synergy Tasks.
We apply query-focused extractive summarisation techniques to generate complex answers to biomedical questions.
For the Synergy task, we selected the candidate sentences following two phases: document retrieval and snippet retrieval.
We observed an improvement of results when the system was trained on the second half of the BioASQ10b training data.
arXiv Detail & Related papers (2022-09-05T07:56:44Z) - Query-Focused Extractive Summarisation for Finding Ideal Answers to
Biomedical and COVID-19 Questions [7.6997148655751895]
Macquarie University participated in the BioASQ Synergy Task and BioASQ9b Phase B.
We used a query-focused summarisation system that was trained with the BioASQ8b training data set.
Considering the poor quality of the documents and snippets retrieved by our system, we observed reasonably good quality in the answers returned.
arXiv Detail & Related papers (2021-08-27T09:19:42Z) - Domain-Specific Pretraining for Vertical Search: Case Study on
Biomedical Literature [67.4680600632232]
Self-supervised learning has emerged as a promising direction to overcome the annotation bottleneck.
We propose a general approach for vertical search based on domain-specific pretraining.
Our system can scale to tens of millions of articles on PubMed and has been deployed as Microsoft Biomedical Search.
arXiv Detail & Related papers (2021-06-25T01:02:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.