LViT: Language meets Vision Transformer in Medical Image Segmentation
- URL: http://arxiv.org/abs/2206.14718v4
- Date: Tue, 27 Jun 2023 01:43:10 GMT
- Title: LViT: Language meets Vision Transformer in Medical Image Segmentation
- Authors: Zihan Li, Yunxiang Li, Qingde Li, Puyang Wang, Dazhou Guo, Le Lu,
Dakai Jin, You Zhang, Qingqi Hong
- Abstract summary: We propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer)
In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data.
Our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting.
- Score: 12.755116093159035
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning has been widely used in medical image segmentation and other
aspects. However, the performance of existing medical image segmentation models
has been limited by the challenge of obtaining sufficient high-quality labeled
data due to the prohibitive data annotation cost. To alleviate this limitation,
we propose a new text-augmented medical image segmentation model LViT (Language
meets Vision Transformer). In our LViT model, medical text annotation is
incorporated to compensate for the quality deficiency in image data. In
addition, the text information can guide to generate pseudo labels of improved
quality in the semi-supervised learning. We also propose an Exponential Pseudo
label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM)
preserve local image features in semi-supervised LViT setting. In our model, LV
(Language-Vision) loss is designed to supervise the training of unlabeled
images using text information directly. For evaluation, we construct three
multimodal medical segmentation datasets (image + text) containing X-rays and
CT images. Experimental results show that our proposed LViT has superior
segmentation performance in both fully-supervised and semi-supervised setting.
The code and datasets are available at https://github.com/HUANGLIZI/LViT.
Related papers
- LSMS: Language-guided Scale-aware MedSegmentor for Medical Image Referring Segmentation [7.912408164613206]
Medical Image Referring (MIRS) requires segmenting lesions in images based on the given language expressions.
We propose an approach named Language-guided Scale-aware MedSegmentor (LSMS)
Our LSMS consistently outperforms on all datasets with lower computational costs.
arXiv Detail & Related papers (2024-08-30T15:22:13Z) - CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training [6.292642131180376]
In this paper, we tackle the lack of image-text data in chest X-ray by expanding image-label pair as image-text pair via general prompt.
We also design two contrastive losses, named ICL and TCL, for learning study-level characteristics of medical images and reports.
Our model outperforms the state-of-the-art models trained under the same conditions.
arXiv Detail & Related papers (2023-10-20T05:44:55Z) - ALIP: Adaptive Language-Image Pre-training with Synthetic Caption [78.93535202851278]
Contrastive Language-Image Pre-training (CLIP) has significantly boosted the performance of various vision-language tasks.
The presence of intrinsic noise and unmatched image-text pairs in web data can potentially affect the performance of representation learning.
We propose an Adaptive Language-Image Pre-training (ALIP), a bi-path model that integrates supervision from both raw text and synthetic caption.
arXiv Detail & Related papers (2023-08-16T15:19:52Z) - Scene Graph as Pivoting: Inference-time Image-free Unsupervised
Multimodal Machine Translation with Visual Scene Hallucination [88.74459704391214]
In this work, we investigate a more realistic unsupervised multimodal machine translation (UMMT) setup.
We represent the input images and texts with the visual and language scene graphs (SG), where such fine-grained vision-language features ensure a holistic understanding of the semantics.
Several SG-pivoting based learning objectives are introduced for unsupervised translation training.
Our method outperforms the best-performing baseline by significant BLEU scores on the task and setup.
arXiv Detail & Related papers (2023-05-20T18:17:20Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - PCRLv2: A Unified Visual Information Preservation Framework for
Self-supervised Pre-training in Medical Image Analysis [56.63327669853693]
We propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics.
We also address the preservation of scale information, a powerful tool in aiding image understanding.
The proposed unified SSL framework surpasses its self-supervised counterparts on various tasks.
arXiv Detail & Related papers (2023-01-02T17:47:27Z) - MIPR:Automatic Annotation of Medical Images with Pixel Rearrangement [7.39560318487728]
We pro?pose a novel approach to solve the lack of annotated data from another angle, called medical image pixel rearrangement (short in MIPR)
The MIPR combines image-editing and pseudo-label technology to obtain labeled data.
Experiments on the ISIC18 show that the effect of the data annotated by our method for segmentation task is is equal to or even better than that of doctors annotations.
arXiv Detail & Related papers (2022-04-22T05:54:14Z) - Positional Contrastive Learning for Volumetric Medical Image
Segmentation [13.086140606803408]
We propose a novel positional contrastive learning framework to generate contrastive data pairs.
The proposed PCL method can substantially improve the segmentation performance compared to existing methods in both semi-supervised setting and transfer learning setting.
arXiv Detail & Related papers (2021-06-16T22:15:28Z) - ATSO: Asynchronous Teacher-Student Optimization for Semi-Supervised
Medical Image Segmentation [99.90263375737362]
We propose ATSO, an asynchronous version of teacher-student optimization.
ATSO partitions the unlabeled data into two subsets and alternately uses one subset to fine-tune the model and updates the label on the other subset.
We evaluate ATSO on two popular medical image segmentation datasets and show its superior performance in various semi-supervised settings.
arXiv Detail & Related papers (2020-06-24T04:05:12Z) - LC-GAN: Image-to-image Translation Based on Generative Adversarial
Network for Endoscopic Images [22.253074722129053]
We propose an image-to-image translation model live-cadaver GAN (LC-GAN) based on generative adversarial networks (GANs)
For live image segmentation, we first translate the live images to fake-cadaveric images with LC-GAN and then perform segmentation on the fake-cadaveric images with models trained on the real cadaveric dataset.
Our model achieves better image-to-image translation and leads to improved segmentation performance in the proposed cross-domain segmentation task.
arXiv Detail & Related papers (2020-03-10T19:59:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.