Related papers: Grounding Multimodal Large Language Models with Quantitative Skin Attributes: A Retrieval Study

Grounding Multimodal Large Language Models with Quantitative Skin Attributes: A Retrieval Study

URL: http://arxiv.org/abs/2508.20188v1
Date: Wed, 27 Aug 2025 18:05:05 GMT
Title: Grounding Multimodal Large Language Models with Quantitative Skin Attributes: A Retrieval Study
Authors: Max Torop, Masih Eskandar, Nicholas Kurtansky, Jinyang Liu, Jochen Weber, Octavia Camps, Veronica Rotemberg, Jennifer Dy, Kivanc Kose,
Abstract summary: We explore the combination of two promising approaches: Multimodal Large Language Models (MLLMs) and quantitative attribute usage.<n>MLLMs offer a potential avenue for increased interpretability, providing reasoning for diagnosis in natural language through an interactive format.<n>We provide evidence that MLLM embedding spaces can be grounded in such attributes, through fine-tuning to predict their values from images.
Score: 2.1206523992812545
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Artificial Intelligence models have demonstrated significant success in diagnosing skin diseases, including cancer, showing the potential to assist clinicians in their analysis. However, the interpretability of model predictions must be significantly improved before they can be used in practice. To this end, we explore the combination of two promising approaches: Multimodal Large Language Models (MLLMs) and quantitative attribute usage. MLLMs offer a potential avenue for increased interpretability, providing reasoning for diagnosis in natural language through an interactive format. Separately, a number of quantitative attributes that are related to lesion appearance (e.g., lesion area) have recently been found predictive of malignancy with high accuracy. Predictions grounded as a function of such concepts have the potential for improved interpretability. We provide evidence that MLLM embedding spaces can be grounded in such attributes, through fine-tuning to predict their values from images. Concretely, we evaluate this grounding in the embedding space through an attribute-specific content-based image retrieval case study using the SLICE-3D dataset.

Related papers

Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency [52.50039435394964]
We systematically evaluate foundation models for regression-based tasks.<n>We extract patch-level features from whole slide images (WSI) using five state-of-the-art foundation models.<n>Models are trained to predict continuous HRD scores based on these extracted features across breast, endometrial, and lung cancer cohorts.
arXiv Detail & Related papers (2026-01-29T14:06:50Z)
ForenX: Towards Explainable AI-Generated Image Detection with Multimodal Large Language Models [82.04858317800097]
We present ForenX, a novel method that not only identifies the authenticity of images but also provides explanations that resonate with human thoughts.<n>ForenX employs the powerful multimodal large language models (MLLMs) to analyze and interpret forensic cues.<n>We introduce ForgReason, a dataset dedicated to descriptions of forgery evidences in AI-generated images.
arXiv Detail & Related papers (2025-08-02T15:21:26Z)
Mammo-SAE: Interpreting Breast Cancer Concept Learning with Sparse Autoencoders [7.23389716633927]
Interpretability is critical in high-stakes domains such as medical imaging.<n>We introduce Sparse Autoencoder (SAE)-based interpretability to breast imaging.
arXiv Detail & Related papers (2025-07-21T03:59:21Z)
LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.<n>We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.<n>Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z)
Explainable Diagnosis Prediction through Neuro-Symbolic Integration [11.842565087408449]
We use neuro-symbolic methods, specifically Logical Neural Networks (LNNs), to develop explainable models for diagnosis prediction.<n>Our models, particularly $M_textmulti-pathway$ and $M_textcomprehensive$, demonstrate superior performance over traditional models.<n>These findings highlight the potential of neuro-symbolic approaches in bridging the gap between accuracy and explainability in healthcare AI applications.
arXiv Detail & Related papers (2024-10-01T22:47:24Z)
Generative causal testing to bridge data-driven models and scientific theories in language neuroscience [82.995061475971]
We present generative causal testing (GCT), a framework for generating concise explanations of language selectivity in the brain.<n>We show that GCT can dissect fine-grained differences between brain areas with similar functional selectivity.
arXiv Detail & Related papers (2024-10-01T15:57:48Z)
Beyond the Hype: A dispassionate look at vision-language models in medical scenario [3.4299097748670255]
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across diverse tasks.<n>Their performance and reliability in specialized domains such as medicine remain insufficiently assessed.<n>We introduce RadVUQA, a novel benchmark to comprehensively evaluate existing LVLMs.
arXiv Detail & Related papers (2024-08-16T12:32:44Z)
CXR-Agent: Vision-language models for chest X-ray interpretation with uncertainty aware radiology reporting [0.0]
We evaluate the publicly available, state of the art, foundational vision-language models for chest X-ray interpretation. We find that vision-language models often hallucinate with confident language, which slows down clinical interpretation. We develop an agent-based vision-language approach for report generation using CheXagent's linear probes and BioViL-T's phrase grounding tools.
arXiv Detail & Related papers (2024-07-11T18:39:19Z)
SemioLLM: Evaluating Large Language Models for Diagnostic Reasoning from Unstructured Clinical Narratives in Epilepsy [45.2233252981348]
Large Language Models (LLMs) have been shown to encode clinical knowledge.<n>We present SemioLLM, an evaluation framework that benchmarks 6 state-of-the-art models.<n>We show that most LLMs are able to accurately and confidently generate probabilistic predictions of seizure onset zones in the brain.
arXiv Detail & Related papers (2024-07-03T11:02:12Z)
Interpretable Medical Diagnostics with Structured Data Extraction by Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports. We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM. We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z)
CancerGPT: Few-shot Drug Pair Synergy Prediction using Large Pre-trained Language Models [3.682742580232362]
Large pre-trained language models (LLMs) have been shown to have significant potential in few-shot learning across various fields. Our research is the first to tackle drug pair synergy prediction in rare tissues with limited data.
arXiv Detail & Related papers (2023-04-18T02:49:53Z)
Multi-Modal Perceiver Language Model for Outcome Prediction in Emergency Department [0.03088120935391119]
We are interested in outcome prediction and patient triage in hospital emergency department based on text information in chief complaints and vital signs recorded at triage. We adapt Perceiver - a modality-agnostic transformer-based model that has shown promising results in several applications. In the experimental analysis, we show that mutli-modality improves the prediction performance compared with models trained solely on text or vital signs.
arXiv Detail & Related papers (2023-04-03T06:32:00Z)
Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability [82.29775890542967]
Estimating personalized effects of treatments is a complex, yet pervasive problem. Recent developments in the machine learning literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools. We use post-hoc feature importance methods to identify features that influence the model's predictions.
arXiv Detail & Related papers (2022-06-16T17:59:05Z)
Integrating Expert ODEs into Neural ODEs: Pharmacology and Disease Progression [71.7560927415706]
latent hybridisation model (LHM) integrates a system of expert-designed ODEs with machine-learned Neural ODEs to fully describe the dynamics of the system. We evaluate LHM on synthetic data as well as real-world intensive care data of COVID-19 patients.
arXiv Detail & Related papers (2021-06-05T11:42:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.