Related papers: Hybrid-Code: A Privacy-Preserving, Redundant Multi-Agent Framework for Reliable Local Clinical Coding

Hybrid-Code: A Privacy-Preserving, Redundant Multi-Agent Framework for Reliable Local Clinical Coding

URL: http://arxiv.org/abs/2512.23743v1
Date: Fri, 26 Dec 2025 02:27:36 GMT
Title: Hybrid-Code: A Privacy-Preserving, Redundant Multi-Agent Framework for Reliable Local Clinical Coding
Authors: Yunguo Yu,
Abstract summary: Clinical coding automation using cloud-based Large Language Models (LLMs) poses privacy risks and latency bottlenecks.<n>We introduce Hybrid-Code, a hybrid neuro-symbolic multi-agent framework for local clinical coding that ensures production reliability through redundancy and verification.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Clinical coding automation using cloud-based Large Language Models (LLMs) poses privacy risks and latency bottlenecks, rendering them unsuitable for on-premise healthcare deployment. We introduce Hybrid-Code, a hybrid neuro-symbolic multi-agent framework for local clinical coding that ensures production reliability through redundancy and verification. Our system comprises two agents: a Coder that attempts language model-based semantic reasoning using BioMistral-7B but falls back to deterministic keyword matching when model output is unreliable, ensuring pipeline completion; and an Auditor that verifies codes against a 257-code knowledge base and clinical evidence. Evaluating on 1,000 MIMIC-III discharge summaries, we demonstrate no hallucinated codes among accepted outputs within the knowledge base, 24.47% verification rate, and 34.11% coverage (95% CI: 31.2%--37.0%) with 86%+ language model utilization. The Auditor filtered invalid format codes and provided evidence-based quality control (75.53% rejection rate) while ensuring no patient data leaves the hospital firewall. The hybrid architecture -- combining language model semantic understanding (when successful), deterministic fallback (when the model fails), and symbolic verification (always active) -- ensures both reliability and privacy preservation, addressing critical barriers to AI adoption in healthcare. Our key finding is that reliability through redundancy is more valuable than pure model performance in production healthcare systems, where system failures are unacceptable.

Related papers

Zero-Knowledge Federated Learning with Lattice-Based Hybrid Encryption for Quantum-Resilient Medical AI [0.0]
Federated Learning (FL) enables collaborative training of medical AI models across hospitals without centralizing patient data.<n> gradient inversion attacks can reconstruct patient information, Byzantine clients can poison the global model, and the emphHarvest Now, Decrypt Later (HNDL) threat renders today's encrypted traffic vulnerable to future quantum adversaries.<n>We introduce emphZeroKnowledge Federated Learning, Post-Quantum, a three-tiered cryptographic protocol that hybridizes (i) ML-KEM for quantum-resistant key encapsulation, (ii) lattice-based Zero-Knowledge Proofs for verifiable emph
arXiv Detail & Related papers (2026-03-03T12:43:44Z)
A Multi-Agent Framework for Medical AI: Leveraging Fine-Tuned GPT, LLaMA, and DeepSeek R1 for Evidence-Based and Bias-Aware Clinical Query Processing [0.4349324020366305]
Large language models (LLMs) show promise for healthcare question answering, but clinical use is limited by weak verification, insufficient evidence grounding, and unreliable confidence signalling.<n>We propose a multi-agent medical QA framework that combines complementary LLMs with evidence retrieval, uncertainty estimation, and bias checks to improve answer reliability.
arXiv Detail & Related papers (2026-02-15T14:17:27Z)
Towards Reliable Medical LLMs: Benchmarking and Enhancing Confidence Estimation of Large Language Models in Medical Consultation [97.36081721024728]
We propose the first benchmark for assessing confidence in multi-turn interaction during realistic medical consultations.<n>Our benchmark unifies three types of medical data for open-ended diagnostic generation.<n>We present MedConf, an evidence-grounded linguistic self-assessment framework.
arXiv Detail & Related papers (2026-01-22T04:51:39Z)
VulAgent: Hypothesis-Validation based Multi-Agent Vulnerability Detection [55.957275374847484]
VulAgent is a multi-agent vulnerability detection framework based on hypothesis validation.<n>It implements a semantics-sensitive, multi-view detection pipeline, each aligned to a specific analysis perspective.<n>On average, VulAgent improves overall accuracy by 6.6%, increases the correct identification rate of vulnerable--fixed code pairs by up to 450%, and reduces the false positive rate by about 36%.
arXiv Detail & Related papers (2025-09-15T02:25:38Z)
Automated Quality Assessment for LLM-Based Complex Qualitative Coding: A Confidence-Diversity Framework [0.23872611575805827]
We develop a dual-signal quality assessment framework that combines model confidence with inter-model consensus (external entropy)<n>We evaluate it across legal reasoning, political analysis, and medical classification transcripts.<n>The framework offers a principled, domain-agnostic quality assurance mechanism that scales qualitative coding without extensive double-coding.
arXiv Detail & Related papers (2025-08-28T06:25:07Z)
Beyond Benchmarks: Dynamic, Automatic And Systematic Red-Teaming Agents For Trustworthy Medical Language Models [87.66870367661342]
Large language models (LLMs) are used in AI applications in healthcare.<n>Red-teaming framework that continuously stress-test LLMs can reveal significant weaknesses in four safety-critical domains.<n>A suite of adversarial agents is applied to autonomously mutate test cases, identify/evolve unsafe-triggering strategies, and evaluate responses.<n>Our framework delivers an evolvable, scalable, and reliable safeguard for the next generation of medical AI.
arXiv Detail & Related papers (2025-07-30T08:44:22Z)
Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models [52.2001050216955]
Existing methods aim to enhance the performance of Medical Vision Language Model (MedVLM) by adjusting model structure, fine-tuning with high-quality data, or through preference fine-tuning.<n>We propose an expert-in-the-loop framework named Expert-Controlled-Free Guidance (Expert-CFG) to align MedVLM with clinical expertise without additional training.
arXiv Detail & Related papers (2025-07-12T09:03:30Z)
Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z)
TrustLoRA: Low-Rank Adaptation for Failure Detection under Out-of-distribution Data [62.22804234013273]
We propose a simple failure detection framework to unify and facilitate classification with rejection under both covariate and semantic shifts.<n>Our key insight is that by separating and consolidating failure-specific reliability knowledge with low-rank adapters, we can enhance the failure detection ability effectively and flexibly.
arXiv Detail & Related papers (2025-04-20T09:20:55Z)
A Preliminary Study on the Robustness of Code Generation by Large Language Models [40.01096420024215]
We present the first empirical study of LLM-generated code robustness using the CoderEval benchmark.<n>We find that 35.2% of their outputs are less robust than human-written code, with over 90% of deficiencies caused by missing conditional checks.<n>To address these issues, we propose RobGen, a model-agnostic framework that improves robustness without retraining.
arXiv Detail & Related papers (2025-03-26T03:44:03Z)
Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding [44.01429184037945]
We introduce ALIGN, a novel compositional LLM-based system for automated, zero-shot medical coding.<n>We evaluate ALIGN on harmonizing medication terms into Anatomical Therapeutic Chemical (ATC) and medical history terms into Medical Dictionary for Regulatory Activities (MedDRA) codes.
arXiv Detail & Related papers (2024-11-20T09:59:12Z)
Which Client is Reliable?: A Reliable and Personalized Prompt-based Federated Learning for Medical Image Question Answering [51.26412822853409]
We present a novel personalized federated learning (pFL) method for medical visual question answering (VQA) models. Our method introduces learnable prompts into a Transformer architecture to efficiently train it on diverse medical datasets without massive computational costs.
arXiv Detail & Related papers (2024-10-23T00:31:17Z)
TransICD: Transformer Based Code-wise Attention Model for Explainable ICD Coding [5.273190477622007]
International Classification of Disease (ICD) coding procedure has been shown to be effective and crucial to the billing system in medical sector. Currently, ICD codes are assigned to a clinical note manually which is likely to cause many errors. In this project, we apply a transformer-based architecture to capture the interdependence among the tokens of a document and then use a code-wise attention mechanism to learn code-specific representations of the entire document.
arXiv Detail & Related papers (2021-03-28T05:34:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.