Unleashing the Potential of Vision-Language Pre-Training for 3D Zero-Shot Lesion Segmentation via Mask-Attribute Alignment
- URL: http://arxiv.org/abs/2410.15744v1
- Date: Mon, 21 Oct 2024 08:01:58 GMT
- Title: Unleashing the Potential of Vision-Language Pre-Training for 3D Zero-Shot Lesion Segmentation via Mask-Attribute Alignment
- Authors: Yankai Jiang, Wenhui Lei, Xiaofan Zhang, Shaoting Zhang,
- Abstract summary: Malenia is a novel multi-scale lesion-level mask-attribute alignment framework.
It is specifically designed for 3D zero-shot lesion segmentation.
- Score: 11.726600999078755
- License:
- Abstract: Recent advancements in medical vision-language pre-training models have driven significant progress in zero-shot disease recognition. However, transferring image-level knowledge to pixel-level tasks, such as lesion segmentation in 3D CT scans, remains a critical challenge. Due to the complexity and variability of pathological visual characteristics, existing methods struggle to align fine-grained lesion features not encountered during training with disease-related textual representations. In this paper, we present Malenia, a novel multi-scale lesion-level mask-attribute alignment framework, specifically designed for 3D zero-shot lesion segmentation. Malenia improves the compatibility between mask representations and their associated elemental attributes, explicitly linking the visual features of unseen lesions with the extensible knowledge learned from previously seen ones. Furthermore, we design a Cross-Modal Knowledge Injection module to enhance both visual and textual features with mutually beneficial information, effectively guiding the generation of segmentation results. Comprehensive experiments across three datasets and 12 lesion categories validate the superior performance of Malenia. Codes will be publicly available.
Related papers
- Enhancing Skin Disease Diagnosis: Interpretable Visual Concept Discovery with SAM Empowerment [41.398287899966995]
Current AI-assisted skin image diagnosis has achieved dermatologist-level performance in classifying skin cancer.
We propose a novel Cross-Attentive Fusion framework for interpretable skin lesion diagnosis.
arXiv Detail & Related papers (2024-09-14T20:11:25Z) - Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework [43.453943987647015]
Medical vision language pre-training has emerged as a frontier of research, enabling zero-shot pathological recognition.
Due to the complex semantics of biomedical texts, current methods struggle to align medical images with key pathological findings in unstructured reports.
This is achieved by consulting a large language model and medical experts.
Ours improves the accuracy of recent methods by up to 8.56% and 17.26% for seen and unseen categories, respectively.
arXiv Detail & Related papers (2024-03-12T13:18:22Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z) - AMLP:Adaptive Masking Lesion Patches for Self-supervised Medical Image
Segmentation [67.97926983664676]
Self-supervised masked image modeling has shown promising results on natural images.
However, directly applying such methods to medical images remains challenging.
We propose a novel self-supervised medical image segmentation framework, Adaptive Masking Lesion Patches (AMLP)
arXiv Detail & Related papers (2023-09-08T13:18:10Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - Feature Representation Learning for Robust Retinal Disease Detection
from Optical Coherence Tomography Images [0.0]
Ophthalmic images may contain identical-looking pathologies that can cause failure in automated techniques to distinguish different retinal degenerative diseases.
In this work, we propose a robust disease detection architecture with three learning heads.
Our experimental results on two publicly available OCT datasets illustrate that the proposed model outperforms existing state-of-the-art models in terms of accuracy, interpretability, and robustness for out-of-distribution retinal disease detection.
arXiv Detail & Related papers (2022-06-24T07:59:36Z) - Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG)
CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure.
Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z) - Incremental Cross-Domain Adaptation for Robust Retinopathy Screening via
Bayesian Deep Learning [7.535751594024775]
Retinopathy represents a group of retinal diseases that, if not treated timely, can cause severe visual impairments or even blindness.
This paper presents a novel incremental cross-domain adaptation instrument that allows any deep classification model to progressively learn abnormal retinal pathologies.
The proposed framework, evaluated on six public datasets, outperforms the state-of-the-art competitors by achieving an overall accuracy and F1 score of 0.9826 and 0.9846, respectively.
arXiv Detail & Related papers (2021-10-18T13:45:21Z) - Multi-label Thoracic Disease Image Classification with Cross-Attention
Networks [65.37531731899837]
We propose a novel scheme of Cross-Attention Networks (CAN) for automated thoracic disease classification from chest x-ray images.
We also design a new loss function that beyond cross-entropy loss to help cross-attention process and is able to overcome the imbalance between classes and easy-dominated samples within each class.
arXiv Detail & Related papers (2020-07-21T14:37:00Z) - Retinopathy of Prematurity Stage Diagnosis Using Object Segmentation and
Convolutional Neural Networks [68.96150598294072]
Retinopathy of Prematurity (ROP) is an eye disorder primarily affecting premature infants with lower weights.
It causes proliferation of vessels in the retina and could result in vision loss and, eventually, retinal detachment, leading to blindness.
In recent years, there has been a significant effort to automate the diagnosis using deep learning.
This paper builds upon the success of previous models and develops a novel architecture, which combines object segmentation and convolutional neural networks (CNN)
Our proposed system first trains an object segmentation model to identify the demarcation line at a pixel level and adds the resulting mask as an additional "color" channel in
arXiv Detail & Related papers (2020-04-03T14:07:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.