Related papers: Multi-LLM Collaboration for Medication Recommendation

Multi-LLM Collaboration for Medication Recommendation

URL: http://arxiv.org/abs/2512.05066v1
Date: Thu, 04 Dec 2025 18:25:15 GMT
Title: Multi-LLM Collaboration for Medication Recommendation
Authors: Huascar Sanchez, Briland Hitaj, Jules Bergmann, Linda Briesemeister,
Abstract summary: Individual large language models (LLMs) are susceptible to hallucinations and inconsistency.<n> naive ensembles of models often fail to deliver stable and credible recommendations.<n>We apply this framework to improve the reliability in medication recommendation from brief clinical vignettes.
Score: 0.4697611383288171
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As healthcare increasingly turns to AI for scalable and trustworthy clinical decision support, ensuring reliability in model reasoning remains a critical challenge. Individual large language models (LLMs) are susceptible to hallucinations and inconsistency, whereas naive ensembles of models often fail to deliver stable and credible recommendations. Building on our previous work on LLM Chemistry, which quantifies the collaborative compatibility among LLMs, we apply this framework to improve the reliability in medication recommendation from brief clinical vignettes. Our approach leverages multi-LLM collaboration guided by Chemistry-inspired interaction modeling, enabling ensembles that are effective (exploiting complementary strengths), stable (producing consistent quality), and calibrated (minimizing interference and error amplification). We evaluate our Chemistry-based Multi-LLM collaboration strategy on real-world clinical scenarios to investigate whether such interaction-aware ensembles can generate credible, patient-specific medication recommendations. Preliminary results are encouraging, suggesting that LLM Chemistry-guided collaboration may offer a promising path toward reliable and trustworthy AI assistants in clinical practice.

Related papers

ClinConsensus: A Consensus-Based Benchmark for Evaluating Chinese Medical LLMs across Difficulty Levels [39.33170904610862]
Large language models (LLMs) are increasingly applied to health management, showing promise across disease prevention, clinical decision-making, and long-term care.<n>We introduce ClinConsensus, a Chinese medical benchmark curated, validated and quality-controlled by clinical experts.<n> ClinConsensus comprises 2500 open-ended cases spanning the full continuum of care--from prevention and intervention to long-term follow-up--covering 36 medical specialties, 12 common clinical task types, and progressively increasing levels of complexity.
arXiv Detail & Related papers (2026-03-02T17:17:18Z)
MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning [52.064286116035134]
We develop MedAlign, a framework to ensure visually accurate LVLM responses for Medical Visual Question Answering (Med-VQA)<n>We first propose a multimodal Direct Preference Optimization (mDPO) objective to align preference learning with visual context.<n>We then design a Retrieval-Aware Mixture-of-Experts (RA-MoE) architecture that utilizes image and text similarity to route queries to a specialized and context-augmented LVLM.
arXiv Detail & Related papers (2025-10-24T02:11:05Z)
A Hybrid Computational Intelligence Framework with Metaheuristic Optimization for Drug-Drug Interaction Prediction [0.8602553195689512]
Drug-drug interactions (DDIs) are a leading cause of preventable adverse events, often complicating treatment and increasing healthcare costs.<n>We propose an interpretable and efficient framework that blends modern machine learning with domain knowledge to improve DDI prediction.<n>Our approach combines two complementary embeddings - Mol2Vec, which captures fragment-level structural patterns, and SMILES-BERT, which learns contextual chemical features.
arXiv Detail & Related papers (2025-10-08T09:55:18Z)
A Knowledge-driven Adaptive Collaboration of LLMs for Enhancing Medical Decision-making [49.048767633316764]
KAMAC is a knowledge-driven Adaptive Multi-Agent Collaboration framework.<n>It enables agents to dynamically form and expand expert teams based on the evolving diagnostic context.<n> Experiments on two real-world medical benchmarks demonstrate that KAMAC significantly outperforms both single-agent and advanced multi-agent methods.
arXiv Detail & Related papers (2025-09-18T14:33:36Z)
Baichuan-M2: Scaling Medical Capability with Large Verifier System [40.86227022086866]
We introduce a novel dynamic verification framework that moves beyond static answer verifier.<n>We develop Baichuan-M2, a medical augmented reasoning model trained through a multi-stage reinforcement learning strategy.<n> evaluated on HealthBench, Baichuan-M2 outperforms all other open-source models and most advanced closed-source counterparts.
arXiv Detail & Related papers (2025-09-02T11:23:35Z)
Lessons Learned from Evaluation of LLM based Multi-agents in Safer Therapy Recommendation [9.84660526673816]
This study investigated the feasibility and value of using a Large Language Model (LLM)-based multi-agent system for safer therapy recommendations.<n>We designed a single agent and a MAS framework simulating multidisciplinary team (MDT) decision-making.<n>We compared MAS performance with single-agent approaches and real-world benchmarks.
arXiv Detail & Related papers (2025-07-15T02:01:38Z)
Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making [80.94208848596215]
We present a new concept called Catfish Agent, a role-specialized LLM designed to inject structured dissent and counter silent agreement.<n>Inspired by the catfish effect'' in organizational psychology, the Catfish Agent is designed to challenge emerging consensus to stimulate deeper reasoning.
arXiv Detail & Related papers (2025-05-27T17:59:50Z)
MedSyn: Enhancing Diagnostics with Human-AI Collaboration [19.23358929400838]
Large Language Models (LLMs) have shown promise as tools for supporting clinical decision-making.<n>We propose a hybrid human-AI framework, MedSyn, where physicians and LLMs engage in multi-step, interactive dialogues to refine diagnoses and treatment decisions.
arXiv Detail & Related papers (2025-05-07T09:37:18Z)
Med-CoDE: Medical Critique based Disagreement Evaluation Framework [72.42301910238861]
The reliability and accuracy of large language models (LLMs) in medical contexts remain critical concerns.<n>Current evaluation methods often lack robustness and fail to provide a comprehensive assessment of LLM performance.<n>We propose Med-CoDE, a specifically designed evaluation framework for medical LLMs to address these challenges.
arXiv Detail & Related papers (2025-04-21T16:51:11Z)
MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making [45.74980058831342]
We introduce a novel multi-agent framework, named Medical Decision-making Agents (MDAgents) The assigned solo or group collaboration structure is tailored to the medical task at hand, emulating real-world medical decision-making processes. MDAgents achieved the best performance in seven out of ten benchmarks on tasks requiring an understanding of medical knowledge.
arXiv Detail & Related papers (2024-04-22T06:30:05Z)
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs. This setup allows for realistic assessments of LLMs in clinical scenarios. We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z)
Large Language Models for Healthcare Data Augmentation: An Example on Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM) Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.