Related papers: Visual Prompt Engineering for Medical Vision Language Models in Radiology

Related papers

Interpreting Biomedical VLMs on High-Imbalance Out-of-Distributions: An Insight into BiomedCLIP on Radiology [0.0]
We analyse the limitations of BiomedCLIP when applied to a highly imbalanced, out-of-distribution medical dataset.<n>We show that the model under zero-shot settings over-predicts all labels, leading to poor precision and inter-class separability.<n>We highlight the need for careful adaptations of the models to foster reliability and applicability in a real-world setting.
arXiv Detail & Related papers (2025-06-17T02:59:42Z)
PathSegDiff: Pathology Segmentation using Diffusion model representations [63.20694440934692]
We propose PathSegDiff, a novel approach for histopathology image segmentation that leverages Latent Diffusion Models (LDMs) as pre-trained featured extractors. Our method utilizes a pathology-specific LDM, guided by a self-supervised encoder, to extract rich semantic information from H&E stained histopathology images. Our experiments demonstrate significant improvements over traditional methods on the BCSS and GlaS datasets.
arXiv Detail & Related papers (2025-04-09T14:58:21Z)
Leveraging Vision-Language Embeddings for Zero-Shot Learning in Histopathology Images [7.048241543461529]
We propose a novel framework called Multi-Resolution Prompt-guided Hybrid Embedding (MR-PHE) to address these challenges in zero-shot histopathology image classification. We introduce a hybrid embedding strategy that integrates global image embeddings with weighted patch embeddings. A similarity-based patch weighting mechanism assigns attention-like weights to patches based on their relevance to class embeddings.
arXiv Detail & Related papers (2025-03-13T12:18:37Z)
Fairness Analysis of CLIP-Based Foundation Models for X-Ray Image Classification [15.98427699337596]
We perform a comprehensive fairness analysis of CLIP-like models applied to X-ray image classification. We assess their performance and fairness across diverse patient demographics and disease categories using zero-shot inference and various fine-tuning techniques.
arXiv Detail & Related papers (2025-01-31T12:23:50Z)
MedFocusCLIP : Improving few shot classification in medical datasets using pixel wise attention [1.2277343096128712]
We propose to leverage advanced segmentation capabilities of Segment Anything Model 2 (SAM2) as a visual prompting cue to help visual encoder in the CLIP (Contrastive Language-Image Pretraining) This helps the model to focus on highly discriminative regions, without getting distracted from visually similar background features. We evaluate our method on diverse medical datasets including X-rays, CT scans, and MRI images, and report an accuracy of (71%, 81%, 86%, 58%) from the proposed approach.
arXiv Detail & Related papers (2025-01-07T14:49:12Z)
Adversarial Vessel-Unveiling Semi-Supervised Segmentation for Retinopathy of Prematurity Diagnosis [9.683492465191241]
We propose a semi supervised segmentation framework designed to advance ROP studies without the need for extensive manual vessel annotation. Unlike previous methods that rely solely on limited labeled data, our approach integrates uncertainty weighted vessel unveiling module and domain adversarial learning. We validate our approach on public datasets and an in-house ROP dataset, demonstrating its superior performance across multiple evaluation metrics.
arXiv Detail & Related papers (2024-11-14T02:40:34Z)
Scribble-Based Interactive Segmentation of Medical Hyperspectral Images [4.675955891956077]
This work introduces a scribble-based interactive segmentation framework for medical hyperspectral images. The proposed method utilizes deep learning for feature extraction and a geodesic distance map generated from user-provided scribbles.
arXiv Detail & Related papers (2024-08-05T12:33:07Z)
Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports [51.45762396192655]
Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence for computer vision. This study evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets.
arXiv Detail & Related papers (2024-07-08T09:08:42Z)
MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning. Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge. Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z)
Explanations of Classifiers Enhance Medical Image Segmentation via End-to-end Pre-training [37.11542605885003]
Medical image segmentation aims to identify and locate abnormal structures in medical images, such as chest radiographs, using deep neural networks. Our work collects explanations from well-trained classifiers to generate pseudo labels of segmentation tasks. We then use Integrated Gradients (IG) method to distill and boost the explanations obtained from the classifiers, generating massive diagnosis-oriented localization labels (DoLL) These DoLL-annotated images are used for pre-training the model before fine-tuning it for downstream segmentation tasks, including COVID-19 infectious areas, lungs, heart, and clavicles.
arXiv Detail & Related papers (2024-01-16T16:18:42Z)
Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation [47.250147322130545]
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images. Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists. We present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.
arXiv Detail & Related papers (2023-11-18T14:37:53Z)
GlanceSeg: Real-time microaneurysm lesion segmentation with gaze-map-guided foundation model for early detection of diabetic retinopathy [13.055297330424397]
Early-stage diabetic retinopathy (DR) presents challenges in clinical diagnosis due to minute microangioma lesions. We propose a human-in-the-loop, label-free early DR diagnosis framework called GlanceSeg, based on segment anything model (SAM) GlanceSeg enables real-time segmentation of microangioma lesions as ophthalmologists review fundus images.
arXiv Detail & Related papers (2023-11-14T10:59:45Z)
Polar-Net: A Clinical-Friendly Model for Alzheimer's Disease Detection in OCTA Images [53.235117594102675]
Optical Coherence Tomography Angiography is a promising tool for detecting Alzheimer's disease (AD) by imaging the retinal microvasculature. We propose a novel deep-learning framework called Polar-Net to provide interpretable results and leverage clinical prior knowledge. We show that Polar-Net outperforms existing state-of-the-art methods and provides more valuable pathological evidence for the association between retinal vascular changes and AD.
arXiv Detail & Related papers (2023-11-10T11:49:49Z)
XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model. It can analyze and answer open-ended questions about chest radiographs. We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z)
Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space. We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains. Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z)
Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders [50.689585476660554]
We propose a new fine-tuning strategy that includes positive-pair loss relaxation and random sentence sampling. Our approach consistently improves overall zero-shot pathology classification across four chest X-ray datasets and three pre-trained models.
arXiv Detail & Related papers (2022-12-14T06:04:18Z)
Data-Efficient Vision Transformers for Multi-Label Disease Classification on Chest Radiographs [55.78588835407174]
Vision Transformers (ViTs) have not been applied to this task despite their high classification performance on generic images. ViTs do not rely on convolutions but on patch-based self-attention and in contrast to CNNs, no prior knowledge of local connectivity is present. Our results show that while the performance between ViTs and CNNs is on par with a small benefit for ViTs, DeiTs outperform the former if a reasonably large data set is available for training.
arXiv Detail & Related papers (2022-08-17T09:07:45Z)
Generative Residual Attention Network for Disease Detection [51.60842580044539]
We present a novel approach for disease generation in X-rays using a conditional generative adversarial learning. We generate a corresponding radiology image in a target domain while preserving the identity of the patient. We then use the generated X-ray image in the target domain to augment our training to improve the detection performance.
arXiv Detail & Related papers (2021-10-25T14:15:57Z)
Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
Explaining Clinical Decision Support Systems in Medical Imaging using Cycle-Consistent Activation Maximization [112.2628296775395]
Clinical decision support using deep neural networks has become a topic of steadily growing interest. clinicians are often hesitant to adopt the technology because its underlying decision-making process is considered to be intransparent and difficult to comprehend. We propose a novel decision explanation scheme based on CycleGAN activation which generates high-quality visualizations of classifier decisions even in smaller data sets.
arXiv Detail & Related papers (2020-10-09T14:39:27Z)
Weakly supervised multiple instance learning histopathological tumor segmentation [51.085268272912415]
We propose a weakly supervised framework for whole slide imaging segmentation. We exploit a multiple instance learning scheme for training models. The proposed framework has been evaluated on multi-locations and multi-centric public data from The Cancer Genome Atlas and the PatchCamelyon dataset.
arXiv Detail & Related papers (2020-04-10T13:12:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.