Related papers: XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare

XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare

URL: http://arxiv.org/abs/2405.06270v4
Date: Fri, 25 Jul 2025 08:24:58 GMT
Title: XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare
Authors: Fatemeh Nazary, Yashar Deldjoo, Tommaso Di Noia, Eugenio di Sciascio,
Abstract summary: We introduce a knowledge-guided in-context learning framework to enable large language models to process structured clinical data.<n>Our approach integrates domain-specific feature groupings, carefully balanced few-shot examples, and task-specific prompting strategies.
Score: 16.79952669254101
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Clinical decision support systems require models that are not only highly accurate but also equitable and sensitive to the implications of missed diagnoses. In this study, we introduce a knowledge-guided in-context learning (ICL) framework designed to enable large language models (LLMs) to effectively process structured clinical data. Our approach integrates domain-specific feature groupings, carefully balanced few-shot examples, and task-specific prompting strategies. We systematically evaluate this method across seventy distinct ICL designs by various prompt variations and two different communication styles-natural-language narrative and numeric conversational-and compare its performance to robust classical machine learning (ML) benchmarks on tasks involving heart disease and diabetes prediction. Our findings indicate that while traditional ML models maintain superior performance in balanced precision-recall scenarios, LLMs employing narrative prompts with integrated domain knowledge achieve higher recall and significantly reduce gender bias, effectively narrowing fairness disparities by an order of magnitude. Despite the current limitation of increased inference latency, LLMs provide notable advantages, including the capacity for zero-shot deployment and enhanced equity. This research offers the first comprehensive analysis of ICL design considerations for applying LLMs to tabular clinical tasks and highlights distillation and multimodal extensions as promising directions for future research.

Related papers

Are Large Language Models Dynamic Treatment Planners? An In Silico Study from a Prior Knowledge Injection Angle [3.0391297540732545]
We evaluate large language models (LLMs) as dynamic insulin dosing agents in an in silico Type 1 diabetes simulator.<n>Our results indicate that carefully designed zero-shot prompts enable smaller LLMs to achieve comparable or superior clinical performance.<n>LLMs exhibit notable limitations, such as overly aggressive insulin dosing when prompted with chain-of-thought.
arXiv Detail & Related papers (2025-08-06T13:46:02Z)
Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey [69.45421620616486]
This work presents the first structured taxonomy and analysis of discrete tokenization methods designed for large language models (LLMs)<n>We categorize 8 representative VQ variants that span classical and modern paradigms and analyze their algorithmic principles, training dynamics, and integration challenges with LLM pipelines.<n>We identify key challenges including codebook collapse, unstable gradient estimation, and modality-specific encoding constraints.
arXiv Detail & Related papers (2025-07-21T10:52:14Z)
Leveraging Embedding Techniques in Multimodal Machine Learning for Mental Illness Assessment [0.8458496687170665]
The increasing global prevalence of mental disorders, such as depression and PTSD, requires objective and scalable diagnostic tools.<n>This paper investigates the potential of multimodal machine learning to address these challenges, leveraging the complementary information available in text, audio, and video data.<n>We explore data-level, feature-level, and decision-level fusion techniques, including a novel integration of Large Language Model predictions.
arXiv Detail & Related papers (2025-04-02T14:19:06Z)
LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition? [30.843971208278006]
multimodal large models (MLLMs) have demonstrated exceptional capabilities in visual understanding and reasoning. We propose LLaVA-RadZ, a framework for zero-shot medical disease recognition. We introduce a Domain Knowledge Anchoring Module (DKAM) to exploit the intrinsic medical knowledge of large models.
arXiv Detail & Related papers (2025-03-10T16:05:40Z)
Zero-shot Large Language Models for Long Clinical Text Summarization with Temporal Reasoning [23.34116653190641]
Large language models (LLMs) have shown potential for transforming data processing in healthcare. This study evaluates the efficacy of zero-shot LLMs in summarizing long clinical texts that require temporal reasoning.
arXiv Detail & Related papers (2025-01-30T19:58:45Z)
Enhancing Patient-Centric Communication: Leveraging LLMs to Simulate Patient Perspectives [19.462374723301792]
Large Language Models (LLMs) have demonstrated impressive capabilities in role-playing scenarios. By mimicking human behavior, LLMs can anticipate responses based on concrete demographic or professional profiles. We evaluate the effectiveness of LLMs in simulating individuals with diverse backgrounds and analyze the consistency of these simulated behaviors.
arXiv Detail & Related papers (2025-01-12T22:49:32Z)
Using Large Language Models for Expert Prior Elicitation in Predictive Modelling [53.54623137152208]
This study proposes using large language models (LLMs) to elicit expert prior distributions for predictive models. We compare LLM-elicited and uninformative priors, evaluate whether LLMs truthfully generate parameter distributions, and propose a model selection strategy for in-context learning and prior elicitation. Our findings show that LLM-elicited prior parameter distributions significantly reduce predictive error compared to uninformative priors in low-data settings.
arXiv Detail & Related papers (2024-11-26T10:13:39Z)
Multimodal Clinical Reasoning through Knowledge-augmented Rationale Generation [12.242305026271675]
We introduce ClinRaGen, an SLM optimized for multimodal rationale generation in disease diagnosis. ClinRaGen incorporates a unique knowledge-augmented attention mechanism to merge domain knowledge with time series EHR data. Our evaluations show that ClinRaGen markedly improves the SLM's capability to interpret multimodal EHR data and generate accurate clinical rationales.
arXiv Detail & Related papers (2024-11-12T07:34:56Z)
Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval [61.70489848327436]
KARE is a novel framework that integrates knowledge graph (KG) community-level retrieval with large language models (LLMs) reasoning.<n>Extensive experiments demonstrate that KARE outperforms leading models by up to 10.8-15.0% on MIMIC-III and 12.6-12.7% on MIMIC-IV for mortality and readmission predictions.
arXiv Detail & Related papers (2024-10-06T18:46:28Z)
IntelliCare: Improving Healthcare Analysis with Variance-Controlled Patient-Level Knowledge from Large Language Models [14.709233593021281]
The integration of external knowledge from Large Language Models (LLMs) presents a promising avenue for improving healthcare predictions. We propose IntelliCare, a novel framework that leverages LLMs to provide high-quality patient-level external knowledge. IntelliCare identifies patient cohorts and employs task-relevant statistical information to augment LLM understanding and generation.
arXiv Detail & Related papers (2024-08-23T13:56:00Z)
When Raw Data Prevails: Are Large Language Model Embeddings Effective in Numerical Data Representation for Medical Machine Learning Applications? [8.89829757177796]
We examine the effectiveness of vector representations from last hidden states of Large Language Models for medical diagnostics and prognostics. We focus on instruction-tuned LLMs in a zero-shot setting to represent abnormal physiological data and evaluate their utilities as feature extractors. Although findings suggest the raw data features still prevails in medical ML tasks, zero-shot LLM embeddings demonstrate competitive results.
arXiv Detail & Related papers (2024-08-15T03:56:40Z)
CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models [68.64605538559312]
In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives. Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance. In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs.
arXiv Detail & Related papers (2024-07-29T23:18:55Z)
Developing Healthcare Language Model Embedding Spaces [0.20971479389679337]
Pre-trained Large Language Models (LLMs) often struggle on out-of-domain datasets like healthcare focused text. Three methods are assessed: traditional masked language modeling, Deep Contrastive Learning for Unsupervised Textual Representations (DeCLUTR) and a novel pre-training objective utilizing metadata categories from the healthcare settings. Contrastively trained models outperform other approaches on the classification tasks, delivering strong performance from limited labeled data and with fewer model parameter updates required.
arXiv Detail & Related papers (2024-03-28T19:31:32Z)
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs) We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z)
C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations. Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z)
Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs) Our research aims to transform existing medication recommendation methodologies using LLMs. To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z)
Aligning Large Language Models for Clinical Tasks [0.0]
Large Language Models (LLMs) have demonstrated remarkable adaptability, showcasing their capacity to excel in tasks for which they were not explicitly trained. We propose an alignment strategy for medical question-answering, known as 'expand-guess-refine' A preliminary analysis of this method demonstrated outstanding performance, achieving a score of 70.63% on a subset of questions sourced from the USMLE dataset.
arXiv Detail & Related papers (2023-09-06T10:20:06Z)
Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning. They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health. Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z)
Iterative Forward Tuning Boosts In-Context Learning in Language Models [88.25013390669845]
In this study, we introduce a novel two-stage framework to boost in-context learning in large language models (LLMs) Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages. The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation.
arXiv Detail & Related papers (2023-05-22T13:18:17Z)
Improving Small Language Models on PubMedQA via Generative Data Augmentation [4.96649519549027]
Large Language Models (LLMs) have made remarkable advancements in the field of natural language processing. Small Language Models (SLMs) are known for their efficiency, but they often struggle with limited capacity and training data. We introduce a novel method aimed at improving SLMs in the medical domain using LLM-based generative data augmentation.
arXiv Detail & Related papers (2023-05-12T23:49:23Z)
Detecting Shortcut Learning for Fair Medical AI using Shortcut Testing [62.9062883851246]
Machine learning holds great promise for improving healthcare, but it is critical to ensure that its use will not propagate or amplify health disparities. One potential driver of algorithmic unfairness, shortcut learning, arises when ML models base predictions on improper correlations in the training data. Using multi-task learning, we propose the first method to assess and mitigate shortcut learning as a part of the fairness assessment of clinical ML systems.
arXiv Detail & Related papers (2022-07-21T09:35:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.