Related papers: Envisioning MedCLIP: A Deep Dive into Explainability for Medical Vision-Language Models

Related papers

ForenX: Towards Explainable AI-Generated Image Detection with Multimodal Large Language Models [82.04858317800097]
We present ForenX, a novel method that not only identifies the authenticity of images but also provides explanations that resonate with human thoughts.<n>ForenX employs the powerful multimodal large language models (MLLMs) to analyze and interpret forensic cues.<n>We introduce ForgReason, a dataset dedicated to descriptions of forgery evidences in AI-generated images.
arXiv Detail & Related papers (2025-08-02T15:21:26Z)
Taming Vision-Language Models for Medical Image Analysis: A Comprehensive Review [10.184536293994789]
Modern Vision-Language Models (VLMs) exhibit unprecedented capabilities in cross-modal semantic understanding.<n>They have emerged as a promising solution for a wide range of medical image analysis tasks.<n>However, adapting general-purpose VLMs to medical domain poses numerous challenges.
arXiv Detail & Related papers (2025-06-23T08:11:24Z)
A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models [74.48084001058672]
The rise of foundation models has transformed machine learning research. multimodal foundation models (MMFMs) pose unique interpretability challenges beyond unimodal frameworks. This survey explores two key aspects: (1) the adaptation of LLM interpretability methods to multimodal models and (2) understanding the mechanistic differences between unimodal language models and crossmodal systems.
arXiv Detail & Related papers (2025-02-22T20:55:26Z)
Vision Foundation Models in Medical Image Analysis: Advances and Challenges [7.224426395050136]
Vision Foundation Models (VFMs) have sparked significant advances in the field of medical image analysis. This paper reviews the state-of-the-art research on the adaptation of VFMs to medical image segmentation. We discuss the latest developments in adapter-based improvements, knowledge distillation techniques, and multi-scale contextual feature modeling.
arXiv Detail & Related papers (2025-02-20T14:13:46Z)
Analyzing Fine-tuning Representation Shift for Multimodal LLMs Steering alignment [53.90425382758605]
We show how fine-tuning alters the internal structure of a model to specialize in new multimodal tasks. Our work sheds light on how multimodal representations evolve through fine-tuning and offers a new perspective for interpreting model adaptation in multimodal tasks.
arXiv Detail & Related papers (2025-01-06T13:37:13Z)
MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning. Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge. Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z)
Enhancing Representation in Medical Vision-Language Foundation Models via Multi-Scale Information Extraction Techniques [41.078761802053535]
We propose a method that effectively exploits multi-scale information to enhance the performance of medical foundation models. We evaluate the effectiveness of the proposed method on six open-source datasets across different clinical tasks.
arXiv Detail & Related papers (2024-01-03T07:22:54Z)
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond [69.64364187449773]
Masked modeling has emerged as a distinctive approach that involves predicting parts of the original data that are proportionally masked during training. We elaborate on the details of techniques within masked modeling, including diverse masking strategies, recovering targets, network architectures, and more. We conclude by discussing the limitations of current techniques and point out several potential avenues for advancing masked modeling research.
arXiv Detail & Related papers (2023-12-31T12:03:21Z)
Domain Generalization for Medical Image Analysis: A Survey [13.34575578242635]
This paper comprehensively reviews domain generalization studies specifically tailored for MedIA. We categorize domain generalization methods into data-level, feature-level, model-level, and analysis-level methods. We show how those methods can be used in various stages of the MedIA workflow with DL equipped from data acquisition to model prediction and analysis.
arXiv Detail & Related papers (2023-10-05T09:31:58Z)
Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models [49.95603725998561]
We propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts. Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model.
arXiv Detail & Related papers (2023-10-04T21:57:09Z)
MinT: Boosting Generalization in Mathematical Reasoning via Multi-View Fine-Tuning [53.90744622542961]
Reasoning in mathematical domains remains a significant challenge for small language models (LMs) We introduce a new method that exploits existing mathematical problem datasets with diverse annotation styles. Experimental results show that our strategy enables a LLaMA-7B model to outperform prior approaches.
arXiv Detail & Related papers (2023-07-16T05:41:53Z)
Looking deeper into interpretable deep learning in neuroimaging: a comprehensive survey [20.373311465258393]
This paper comprehensively reviews interpretable deep learning models in the neuroimaging domain. We discuss how multiple recent neuroimaging studies leveraged model interpretability to capture anatomical and functional brain alterations most relevant to model predictions.
arXiv Detail & Related papers (2023-07-14T04:50:04Z)
Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study [8.547751745702156]
We show that well-designed medical prompts are the key to elicit knowledge from pre-trained vision language models (VLM) We develop three approaches for automatic generation of medical prompts, which can inject expert-level medical knowledge and image-specific information into the prompts for fine-grained grounding.
arXiv Detail & Related papers (2022-09-30T15:06:13Z)
DIME: Fine-grained Interpretations of Multimodal Models via Disentangled Local Explanations [119.1953397679783]
We focus on advancing the state-of-the-art in interpreting multimodal models. Our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models.
arXiv Detail & Related papers (2022-03-03T20:52:47Z)
TorchEsegeta: Framework for Interpretability and Explainability of Image-based Deep Learning Models [0.0]
Clinicians are often sceptical about applying automatic image processing approaches, especially deep learning based methods, in practice. This paper presents approaches that help to interpret and explain the results of deep learning algorithms by depicting the anatomical areas which influence the decision of the algorithm most. Research presents a unified framework, TorchEsegeta, for applying various interpretability and explainability techniques for deep learning models.
arXiv Detail & Related papers (2021-10-16T01:00:15Z)
Domain Shift in Computer Vision models for MRI data analysis: An Overview [64.69150970967524]
Machine learning and computer vision methods are showing good performance in medical imagery analysis. Yet only a few applications are now in clinical use. Poor transferability of themodels to data from different sources or acquisition domains is one of the reasons for that.
arXiv Detail & Related papers (2020-10-14T16:34:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.