Related papers: Collaboration among Multiple Large Language Models for Medical Question Answering

Collaboration among Multiple Large Language Models for Medical Question Answering

URL: http://arxiv.org/abs/2505.16648v1
Date: Thu, 22 May 2025 13:18:45 GMT
Title: Collaboration among Multiple Large Language Models for Medical Question Answering
Authors: Kexin Shang, Chia-Hsuan Chang, Christopher C. Yang,
Abstract summary: We propose a multi-LLM collaboration framework tailored on a medical multiple-choice questions dataset.<n>Our framework is proved to boost all LLMs reasoning ability as well as alleviate their divergence among questions.
Score: 0.393259574660092
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Empowered by vast internal knowledge reservoir, the new generation of large language models (LLMs) demonstrate untapped potential to tackle medical tasks. However, there is insufficient effort made towards summoning up a synergic effect from multiple LLMs' expertise and background. In this study, we propose a multi-LLM collaboration framework tailored on a medical multiple-choice questions dataset. Through post-hoc analysis on 3 pre-trained LLM participants, our framework is proved to boost all LLMs reasoning ability as well as alleviate their divergence among questions. We also measure an LLM's confidence when it confronts with adversary opinions from other LLMs and observe a concurrence between LLM's confidence and prediction accuracy.

Related papers

Teaming LLMs to Detect and Mitigate Hallucinations [0.0]
We show that extending single-model consistency methods can result in substantial further improvements in hallucination detection and mitigation capabilities.<n>We evaluate this "consortium consistency" approach across many model teams from a pool of 15 model teams.
arXiv Detail & Related papers (2025-10-22T12:03:43Z)
LLM4VV: Evaluating Cutting-Edge LLMs for Generation and Evaluation of Directive-Based Parallel Programming Model Compiler Tests [7.6818904666624395]
This paper proposes a dual-LLM system and experiments with the usage of LLMs for the generation of compiler tests.<n>It is evident that LLMs possess the promising potential to generate quality compiler tests and verify them automatically.
arXiv Detail & Related papers (2025-07-29T02:34:28Z)
MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning [26.736078756799635]
We introduce a new post-training paradigm MAPoRL (Multi-Agent Post-co-training for collaborative LLMs with Reinforcement Learning)<n>In MAPoRL, multiple LLMs first generate their own responses independently and engage in a multi-turn discussion to collaboratively improve the final answer.<n>A MAPoRL verifier evaluates both the answer and the discussion, by assigning a score that verifies the correctness of the answer.<n>The score serves as the co-training reward, and is then maximized through multi-agent RL.
arXiv Detail & Related papers (2025-02-25T18:33:48Z)
Understanding the Role of LLMs in Multimodal Evaluation Benchmarks [77.59035801244278]
This paper investigates the role of the Large Language Model (LLM) backbone in Multimodal Large Language Models (MLLMs) evaluation. Our study encompasses four diverse MLLM benchmarks and eight state-of-the-art MLLMs. Key findings reveal that some benchmarks allow high performance even without visual inputs and up to 50% of error rates can be attributed to insufficient world knowledge in the LLM backbone.
arXiv Detail & Related papers (2024-10-16T07:49:13Z)
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks. LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning. We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z)
LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions [8.55917897789612]
We focus on the cooperative tasks of multiple agents with a common goal and communication among them. We also consider human-in/on-the-loop scenarios enabled by the language component in the framework.
arXiv Detail & Related papers (2024-05-17T22:10:23Z)
Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs [60.40396361115776]
This paper introduces a novel collaborative approach, namely SlimPLM, that detects missing knowledge in large language models (LLMs) with a slim proxy model. We employ a proxy model which has far fewer parameters, and take its answers as answers. Heuristic answers are then utilized to predict the knowledge required to answer the user question, as well as the known and unknown knowledge within the LLM.
arXiv Detail & Related papers (2024-02-19T11:11:08Z)
Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs [52.42505579545893]
Large language models (LLMs) demonstrate strong reasoning abilities when prompted to generate chain-of-thought explanations alongside answers. We propose a novel discriminative and generative CoT evaluation paradigm to assess LLMs' knowledge of reasoning and the accuracy of the generated CoT.
arXiv Detail & Related papers (2024-02-17T05:22:56Z)
Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration [39.603649838876294]
We study approaches to identify LLM knowledge gaps and abstain from answering questions when knowledge gaps are present. Motivated by their failures in self-reflection and over-reliance on held-out sets, we propose two novel approaches.
arXiv Detail & Related papers (2024-02-01T06:11:49Z)
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity [61.54815512469125]
This survey addresses the crucial issue of factuality in Large Language Models (LLMs) As LLMs find applications across diverse domains, the reliability and accuracy of their outputs become vital.
arXiv Detail & Related papers (2023-10-11T14:18:03Z)
Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation [109.8527403904657]
We show that large language models (LLMs) possess unwavering confidence in their knowledge and cannot handle the conflict between internal and external knowledge well. Retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries. We propose a simple method to dynamically utilize supporting documents with our judgement strategy.
arXiv Detail & Related papers (2023-07-20T16:46:10Z)
Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate [41.949869545423375]
Large Language Models (LLMs) have shown impressive capabilities in various applications, but they still face various inconsistency issues. To examine whether LLMs can collaborate effectively to achieve a consensus for a shared goal, we focus on commonsense reasoning. Our work contributes to understanding the inter-consistency among LLMs and lays the foundation for developing future collaboration methods.
arXiv Detail & Related papers (2023-05-19T11:15:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.