Related papers: Hierarchical Vision-Language Learning for Medical Out-of-Distribution Detection

Hierarchical Vision-Language Learning for Medical Out-of-Distribution Detection

URL: http://arxiv.org/abs/2508.17667v1
Date: Mon, 25 Aug 2025 04:55:27 GMT
Title: Hierarchical Vision-Language Learning for Medical Out-of-Distribution Detection
Authors: Runhe Lai, Xinhua Lu, Kanghao Chen, Qichao Chen, Wei-Shi Zheng, Ruixuan Wang,
Abstract summary: We propose a novel OOD detection framework based on vision-language models (VLMs)<n>Cross-scale visual fusion strategy is proposed to couple visual embeddings from multiple scales.<n>A cross-scale hard pseudo-OOD sample generation strategy is proposed to benefit OOD detection achieves maximally.
Score: 42.73509543934366
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In trustworthy medical diagnosis systems, integrating out-of-distribution (OOD) detection aims to identify unknown diseases in samples, thereby mitigating the risk of misdiagnosis. In this study, we propose a novel OOD detection framework based on vision-language models (VLMs), which integrates hierarchical visual information to cope with challenging unknown diseases that resemble known diseases. Specifically, a cross-scale visual fusion strategy is proposed to couple visual embeddings from multiple scales. This enriches the detailed representation of medical images and thus improves the discrimination of unknown diseases. Moreover, a cross-scale hard pseudo-OOD sample generation strategy is proposed to benefit OOD detection maximally. Experimental evaluations on three public medical datasets support that the proposed framework achieves superior OOD detection performance compared to existing methods. The source code is available at https://openi.pcl.ac.cn/OpenMedIA/HVL.

Related papers

RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis [56.373297358647655]
Retrieval-Augmented Diagnosis (RAD) is a novel framework that injects external knowledge into multimodal models directly on downstream tasks.<n>RAD operates through three key mechanisms: retrieval and refinement of disease-centered knowledge from multiple medical sources, a guideline-enhanced contrastive loss transformer, and a dual decoder.
arXiv Detail & Related papers (2025-09-24T10:36:14Z)
NERO: Explainable Out-of-Distribution Detection with Neuron-level Relevance [13.36825494924134]
We propose a novel OOD scoring mechanism, called NERO, that leverages neuron-level relevance at the feature layer.<n>Specifically, we cluster neuron-level relevance for each in-distribution (ID) class to form representative centroids.<n>We refine performance by incorporating scaled relevance in the bias term and combining feature norms.
arXiv Detail & Related papers (2025-06-18T12:22:17Z)
Unsupervised Out-of-Distribution Detection in Medical Imaging Using Multi-Exit Class Activation Maps and Feature Masking [15.899277292315995]
Out-of-distribution (OOD) detection is essential for ensuring the reliability of deep learning models in medical imaging applications.<n>This work is motivated by the observation that class activation maps (CAMs) for in-distribution (ID) data typically emphasize regions that are highly relevant to the model's predictions.<n>We introduce a novel unsupervised OOD detection framework, Multi-Exit Class Activation Map (MECAM), which leverages multi-exit CAMs and feature masking.
arXiv Detail & Related papers (2025-05-13T14:18:58Z)
Delving into Out-of-Distribution Detection with Medical Vision-Language Models [14.286027727962104]
We conduct the first systematic investigation into the OOD detection potential of medical vision-language models.<n>To accurately reflect real-world challenges, we introduce a cross-modality evaluation benchmarking pipeline for full-spectrum OOD detection.<n>We propose a novel hierarchical prompt-based method that significantly enhances OOD detection performance.
arXiv Detail & Related papers (2025-03-02T21:09:51Z)
Optimizing Skin Lesion Classification via Multimodal Data and Auxiliary Task Integration [54.76511683427566]
This research introduces a novel multimodal method for classifying skin lesions, integrating smartphone-captured images with essential clinical and demographic information. A distinctive aspect of this method is the integration of an auxiliary task focused on super-resolution image prediction. The experimental evaluations have been conducted using the PAD-UFES20 dataset, applying various deep-learning architectures.
arXiv Detail & Related papers (2024-02-16T05:16:20Z)
Exploring Large Language Models for Multi-Modal Out-of-Distribution Detection [67.68030805755679]
Large language models (LLMs) encode a wealth of world knowledge and can be prompted to generate descriptive features for each class. In this paper, we propose to apply world knowledge to enhance OOD detection performance through selective generation from LLMs.
arXiv Detail & Related papers (2023-10-12T04:14:28Z)
DIAGNOSE: Avoiding Out-of-distribution Data using Submodular Information Measures [13.492292022589918]
We propose Diagnose, a novel active learning framework that can jointly model similarity and dissimilarity. Our experiments verify the superiority of Diagnose over the state-of-the-art AL methods across multiple domains of medical imaging.
arXiv Detail & Related papers (2022-10-04T11:07:48Z)
Confidence-based Out-of-Distribution Detection: A Comparative Study and Analysis [17.398553230843717]
We assess the capability of various state-of-the-art approaches for confidence-based OOD detection. First, we leverage a computer vision benchmark to reproduce and compare multiple OOD detection methods. We then evaluate their capabilities on the challenging task of disease classification using chest X-rays.
arXiv Detail & Related papers (2021-07-06T12:10:09Z)
Malignancy Prediction and Lesion Identification from Clinical Dermatological Images [65.1629311281062]
We consider machine-learning-based malignancy prediction and lesion identification from clinical dermatological images. We first identify all lesions present in the image regardless of sub-type or likelihood of malignancy, then it estimates their likelihood of malignancy, and through aggregation, it also generates an image-level likelihood of malignancy.
arXiv Detail & Related papers (2021-04-02T20:52:05Z)
Variational Knowledge Distillation for Disease Classification in Chest X-Rays [102.04931207504173]
We propose itvariational knowledge distillation (VKD), which is a new probabilistic inference framework for disease classification based on X-rays. We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs.
arXiv Detail & Related papers (2021-03-19T14:13:56Z)
An Interpretable Multiple-Instance Approach for the Detection of referable Diabetic Retinopathy from Fundus Images [72.94446225783697]
We propose a machine learning system for the detection of referable Diabetic Retinopathy in fundus images. By extracting local information from image patches and combining it efficiently through an attention mechanism, our system is able to achieve high classification accuracy. We evaluate our approach on publicly available retinal image datasets, in which it exhibits near state-of-the-art performance.
arXiv Detail & Related papers (2021-03-02T13:14:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.