Self-adaptive vision-language model for 3D segmentation of pulmonary artery and vein
- URL: http://arxiv.org/abs/2501.03722v1
- Date: Tue, 07 Jan 2025 12:03:02 GMT
- Title: Self-adaptive vision-language model for 3D segmentation of pulmonary artery and vein
- Authors: Xiaotong Guo, Deqian Yang, Dan Wang, Haochen Zhao, Yuan Li, Zhilin Sui, Tao Zhou, Lijun Zhang, Yanda Meng,
- Abstract summary: This paper proposes a novel framework called Language-guided self-adaptive Cross-Attention Fusion Framework.
Our method adopts pre-trained CLIP as a strong feature extractor for generating the segmentation of 3D CT scans.
We extensively validate our method on a local dataset, which is the largest pulmonary artery-vein CT dataset to date.
- Score: 18.696258519327095
- License:
- Abstract: Accurate segmentation of pulmonary structures iscrucial in clinical diagnosis, disease study, and treatment planning. Significant progress has been made in deep learning-based segmentation techniques, but most require much labeled data for training. Consequently, developing precise segmentation methods that demand fewer labeled datasets is paramount in medical image analysis. The emergence of pre-trained vision-language foundation models, such as CLIP, recently opened the door for universal computer vision tasks. Exploiting the generalization ability of these pre-trained foundation models on downstream tasks, such as segmentation, leads to unexpected performance with a relatively small amount of labeled data. However, exploring these models for pulmonary artery-vein segmentation is still limited. This paper proposes a novel framework called Language-guided self-adaptive Cross-Attention Fusion Framework. Our method adopts pre-trained CLIP as a strong feature extractor for generating the segmentation of 3D CT scans, while adaptively aggregating the cross-modality of text and image representations. We propose a s pecially designed adapter module to fine-tune pre-trained CLIP with a self-adaptive learning strategy to effectively fuse the two modalities of embeddings. We extensively validate our method on a local dataset, which is the largest pulmonary artery-vein CT dataset to date and consists of 718 labeled data in total. The experiments show that our method outperformed other state-of-the-art methods by a large margin. Our data and code will be made publicly available upon acceptance.
Related papers
- Medical Semantic Segmentation with Diffusion Pretrain [1.9415817267757087]
Recent advances in deep learning have shown that learning robust feature representations is critical for the success of many computer vision tasks.
We propose a novel pretraining strategy using diffusion models with anatomical guidance, tailored to the intricacies of 3D medical image data.
We employ an additional model that predicts 3D universal body-part coordinates, providing guidance during the diffusion process.
arXiv Detail & Related papers (2025-01-31T16:25:49Z) - MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation [25.74088298769155]
We propose a universal training framework called MedContext for 3D medical segmentation.
Our approach effectively learns self supervised contextual cues jointly with the supervised voxel segmentation task.
The effectiveness of MedContext is validated across multiple 3D medical datasets and four state-of-the-art model architectures.
arXiv Detail & Related papers (2024-02-27T17:58:05Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - Extraction of volumetric indices from echocardiography: which deep
learning solution for clinical use? [6.144041824426555]
We show that the proposed 3D nnU-Net outperforms alternative 2D and recurrent segmentation methods.
Overall, the experimental results suggest that with sufficient training data, 3D nnU-Net could become the first automated tool to meet the standards of an everyday clinical device.
arXiv Detail & Related papers (2023-05-03T09:38:52Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - CT-LungNet: A Deep Learning Framework for Precise Lung Tissue
Segmentation in 3D Thoracic CT Scans [1.1014741301167645]
This paper presents a fully automatic method that identifies the lungs in 3D pulmonary CT images using deep networks and transfer learning.
Our method was quantitatively assessed using one public dataset, LUNA16, for training and testing and two public datasets, namely, VESSEL12 and CRPF, only for testing.
arXiv Detail & Related papers (2022-12-28T17:37:08Z) - PCA: Semi-supervised Segmentation with Patch Confidence Adversarial
Training [52.895952593202054]
We propose a new semi-supervised adversarial method called Patch Confidence Adrial Training (PCA) for medical image segmentation.
PCA learns the pixel structure and context information in each patch to get enough gradient feedback, which aids the discriminator in convergent to an optimal state.
Our method outperforms the state-of-the-art semi-supervised methods, which demonstrates its effectiveness for medical image segmentation.
arXiv Detail & Related papers (2022-07-24T07:45:47Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Improving Classification Model Performance on Chest X-Rays through Lung
Segmentation [63.45024974079371]
We propose a deep learning approach to enhance abnormal chest x-ray (CXR) identification performance through segmentations.
Our approach is designed in a cascaded manner and incorporates two modules: a deep neural network with criss-cross attention modules (XLSor) for localizing lung region in CXR images and a CXR classification model with a backbone of a self-supervised momentum contrast (MoCo) model pre-trained on large-scale CXR data sets.
arXiv Detail & Related papers (2022-02-22T15:24:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.