Harnessing Collective Intelligence of LLMs for Robust Biomedical QA: A Multi-Model Approach
- URL: http://arxiv.org/abs/2508.01480v1
- Date: Sat, 02 Aug 2025 20:20:08 GMT
- Title: Harnessing Collective Intelligence of LLMs for Robust Biomedical QA: A Multi-Model Approach
- Authors: Dimitra Panou, Alexandros C. Dimopoulos, Manolis Koubarakis, Martin Reczko,
- Abstract summary: We present our participation in the 13th edition of the BioASQ challenge, which involves biomedical semantic question-answering.<n>We deploy a selection of open-source large language models (LLMs) as retrieval-augmented generators to answer biomedical questions.<n>We evaluated 13 state-of-the-art open source LLMs, exploring all possible model combinations to contribute to the final answer.
- Score: 44.035446389573345
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Biomedical text mining and question-answering are essential yet highly demanding tasks, particularly in the face of the exponential growth of biomedical literature. In this work, we present our participation in the 13th edition of the BioASQ challenge, which involves biomedical semantic question-answering for Task 13b and biomedical question-answering for developing topics for the Synergy task. We deploy a selection of open-source large language models (LLMs) as retrieval-augmented generators to answer biomedical questions. Various models are used to process the questions. A majority voting system combines their output to determine the final answer for Yes/No questions, while for list and factoid type questions, the union of their answers in used. We evaluated 13 state-of-the-art open source LLMs, exploring all possible model combinations to contribute to the final answer, resulting in tailored LLM pipelines for each question type. Our findings provide valuable insight into which combinations of LLMs consistently produce superior results for specific question types. In the four rounds of the 2025 BioASQ challenge, our system achieved notable results: in the Synergy task, we secured 1st place for ideal answers and 2nd place for exact answers in round 2, as well as two shared 1st places for exact answers in round 3 and 4.
Related papers
- Beyond Retrieval: Ensembling Cross-Encoders and GPT Rerankers with LLMs for Biomedical QA [0.0]
This paper presents the methodologies and results from our participation in the BioASQ 2025 Task13b Challenge.<n>We built a Retrieval-Augmented Generation (RAG) system that can answer biomedical questions by retrieving relevant PubMed documents and snippets to generate answers.<n>Our solution achieved an MAP@10 of 0.1581, placing 10th on the leaderboard for the retrieval task.
arXiv Detail & Related papers (2025-07-08T01:25:06Z) - Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent [92.5712549836791]
Multimodal Retrieval Augmented Generation (mRAG) plays an important role in mitigating the "hallucination" issue inherent in multimodal large language models (MLLMs)<n>We propose the first self-adaptive planning agent for multimodal retrieval, OmniSearch.
arXiv Detail & Related papers (2024-11-05T09:27:21Z) - Using Pretrained Large Language Model with Prompt Engineering to Answer Biomedical Questions [1.0742675209112622]
We propose a two-level information retrieval and question-answering system based on pre-trained large language models (LLM)
We construct prompts with in-context few-shot examples and utilize post-processing techniques like resampling and malformed response detection.
Our best-performing system achieved 0.14 MAP score on document retrieval, 0.05 MAP score on snippet retrieval, 0.96 F1 score for yes/no questions, 0.38 MRR score for factoid questions and 0.50 F1 score for list questions in Task 12b.
arXiv Detail & Related papers (2024-07-09T11:48:49Z) - Multi-LLM QA with Embodied Exploration [55.581423861790945]
We investigate the use of Multi-Embodied LLM Explorers (MELE) for question-answering in an unknown environment.
Multiple LLM-based agents independently explore and then answer queries about a household environment.
We analyze different aggregation methods to generate a single, final answer for each query.
arXiv Detail & Related papers (2024-06-16T12:46:40Z) - Quriosity: Analyzing Human Questioning Behavior and Causal Inquiry through Curiosity-Driven Queries [91.70689724416698]
We present Quriosity, a collection of 13.5K naturally occurring questions from three diverse sources.<n>Our analysis reveals a significant presence of causal questions (up to 42%) in the dataset.
arXiv Detail & Related papers (2024-05-30T17:55:28Z) - Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering [55.295699268654545]
We propose a novel Chain-ofDiscussion framework to leverage the synergy among open-source Large Language Models.<n>Our experiments show that discussions among multiple LLMs play a vital role in enhancing the quality of answers.
arXiv Detail & Related papers (2024-02-26T05:31:34Z) - Improving Question Generation with Multi-level Content Planning [70.37285816596527]
This paper addresses the problem of generating questions from a given context and an answer, specifically focusing on questions that require multi-hop reasoning across an extended context.
We propose MultiFactor, a novel QG framework based on multi-level content planning. Specifically, MultiFactor includes two components: FA-model, which simultaneously selects key phrases and generates full answers, and Q-model which takes the generated full answer as an additional input to generate questions.
arXiv Detail & Related papers (2023-10-20T13:57:01Z) - Contributions to the Improvement of Question Answering Systems in the
Biomedical Domain [0.951828574518325]
This thesis work falls within the framework of question answering (QA) in the biomedical domain.
We propose four contributions to improve the performance of QA in the biomedical domain.
We develop a fully automated semantic biomedical QA system called SemBioNLQA.
arXiv Detail & Related papers (2023-07-25T16:31:20Z) - Getting MoRE out of Mixture of Language Model Reasoning Experts [71.61176122960464]
We propose a Mixture-of-Reasoning-Experts (MoRE) framework that ensembles diverse specialized language models.
We specialize the backbone language model with prompts optimized for different reasoning categories, including factual, multihop, mathematical, and commonsense reasoning.
Our human study confirms that presenting expert predictions and the answer selection process helps annotators more accurately calibrate when to trust the system's output.
arXiv Detail & Related papers (2023-05-24T02:00:51Z) - Query-focused Extractive Summarisation for Biomedical and COVID-19
Complex Question Answering [0.0]
This paper presents Macquarie University's participation in the two most recent BioASQ Synergy Tasks.
We apply query-focused extractive summarisation techniques to generate complex answers to biomedical questions.
For the Synergy task, we selected the candidate sentences following two phases: document retrieval and snippet retrieval.
We observed an improvement of results when the system was trained on the second half of the BioASQ10b training data.
arXiv Detail & Related papers (2022-09-05T07:56:44Z) - Sequence Tagging for Biomedical Extractive Question Answering [12.464143741310137]
We investigate the difference of the question distribution across the general and biomedical domains.
We discover biomedical questions are more likely to require list-type answers (multiple answers) than factoid-type answers (single answer)
Our approach can learn to decide the number of answers for a question from training data.
arXiv Detail & Related papers (2021-04-15T15:42:34Z) - Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex
Healthcare Question Answering [89.76059961309453]
HeadQA dataset contains multiple-choice questions authorized for the public healthcare specialization exam.
These questions are the most challenging for current QA systems.
We present a Multi-step reasoning with Knowledge extraction framework (MurKe)
We are striving to make full use of off-the-shelf pre-trained models.
arXiv Detail & Related papers (2020-08-06T02:47:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.