Contrastive Learning of Medical Visual Representations from Paired
Images and Text
- URL: http://arxiv.org/abs/2010.00747v2
- Date: Mon, 19 Sep 2022 20:20:23 GMT
- Title: Contrastive Learning of Medical Visual Representations from Paired
Images and Text
- Authors: Yuhao Zhang, Hang Jiang, Yasuhide Miura, Christopher D. Manning,
Curtis P. Langlotz
- Abstract summary: We propose ConVIRT, an unsupervised strategy to learn medical visual representations by exploiting naturally occurring descriptive paired text.
Our new method of pretraining medical image encoders with the paired text data via a bidirectional contrastive objective between the two modalities is domain-agnostic, and requires no additional expert input.
- Score: 38.91117443316013
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning visual representations of medical images (e.g., X-rays) is core to
medical image understanding but its progress has been held back by the scarcity
of human annotations. Existing work commonly relies on fine-tuning weights
transferred from ImageNet pretraining, which is suboptimal due to drastically
different image characteristics, or rule-based label extraction from the
textual report data paired with medical images, which is inaccurate and hard to
generalize. Meanwhile, several recent studies show exciting results from
unsupervised contrastive learning from natural images, but we find these
methods help little on medical images because of their high inter-class
similarity. We propose ConVIRT, an alternative unsupervised strategy to learn
medical visual representations by exploiting naturally occurring paired
descriptive text. Our new method of pretraining medical image encoders with the
paired text data via a bidirectional contrastive objective between the two
modalities is domain-agnostic, and requires no additional expert input. We test
ConVIRT by transferring our pretrained weights to 4 medical image
classification tasks and 2 zero-shot retrieval tasks, and show that it leads to
image representations that considerably outperform strong baselines in most
settings. Notably, in all 4 classification tasks, our method requires only 10\%
as much labeled training data as an ImageNet initialized counterpart to achieve
better or comparable performance, demonstrating superior data efficiency.
Related papers
- Autoregressive Sequence Modeling for 3D Medical Image Representation [48.706230961589924]
We introduce a pioneering method for learning 3D medical image representations through an autoregressive sequence pre-training framework.
Our approach various 3D medical images based on spatial, contrast, and semantic correlations, treating them as interconnected visual tokens within a token sequence.
arXiv Detail & Related papers (2024-09-13T10:19:10Z) - Mining Gaze for Contrastive Learning toward Computer-Assisted Diagnosis [61.089776864520594]
We propose eye-tracking as an alternative to text reports for medical images.
By tracking the gaze of radiologists as they read and diagnose medical images, we can understand their visual attention and clinical reasoning.
We introduce the Medical contrastive Gaze Image Pre-training (McGIP) as a plug-and-play module for contrastive learning frameworks.
arXiv Detail & Related papers (2023-12-11T02:27:45Z) - Unified Medical Image-Text-Label Contrastive Learning With Continuous
Prompt [3.218449686637963]
We propose a unified Image-Text-Label contrastive learning framework based on continuous prompts.
We demonstrate through sufficient experiments that the Unified Medical Contrastive Learning framework exhibits excellent performance on several downstream tasks.
arXiv Detail & Related papers (2023-07-12T05:19:10Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - Metadata-enhanced contrastive learning from retinal optical coherence tomography images [7.932410831191909]
We extend conventional contrastive frameworks with a novel metadata-enhanced strategy.
Our approach employs widely available patient metadata to approximate the true set of inter-image contrastive relationships.
Our approach outperforms both standard contrastive methods and a retinal image foundation model in five out of six image-level downstream tasks.
arXiv Detail & Related papers (2022-08-04T08:53:15Z) - Joint Learning of Localized Representations from Medical Images and
Reports [0.0]
We propose Localized representation learning from Vision and Text (LoVT) to target localized medical imaging tasks.
Our method combines instance-level image-report contrastive learning with local contrastive learning on image region and report sentence representations.
LoVT performs best on 11 out of the 18 studied tasks making it the preferred method of choice for localized tasks.
arXiv Detail & Related papers (2021-12-06T09:27:24Z) - Positional Contrastive Learning for Volumetric Medical Image
Segmentation [13.086140606803408]
We propose a novel positional contrastive learning framework to generate contrastive data pairs.
The proposed PCL method can substantially improve the segmentation performance compared to existing methods in both semi-supervised setting and transfer learning setting.
arXiv Detail & Related papers (2021-06-16T22:15:28Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z) - Discriminative Cross-Modal Data Augmentation for Medical Imaging
Applications [24.06277026586584]
Deep learning methods have shown great success in medical image analysis, they require a number of medical images to train.
Due to data privacy concerns and unavailability of medical annotators, it is oftentimes very difficult to obtain a lot of labeled medical images for model training.
We propose a discriminative unpaired image-to-image translation model which translates images in source modality into images in target modality.
arXiv Detail & Related papers (2020-10-07T15:07:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.