Joint Learning of Localized Representations from Medical Images and
Reports
- URL: http://arxiv.org/abs/2112.02889v1
- Date: Mon, 6 Dec 2021 09:27:24 GMT
- Title: Joint Learning of Localized Representations from Medical Images and
Reports
- Authors: Philip M\"uller (1), Georgios Kaissis (1 and 2), Congyu Zou (1),
Daniel R\"uckert (1 and 2) ((1) Technical University of Munich, (2) Imperial
College London)
- Abstract summary: We propose Localized representation learning from Vision and Text (LoVT) to target localized medical imaging tasks.
Our method combines instance-level image-report contrastive learning with local contrastive learning on image region and report sentence representations.
LoVT performs best on 11 out of the 18 studied tasks making it the preferred method of choice for localized tasks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrastive learning has proven effective for pre-training image models on
unlabeled data with promising results for tasks such as medical image
classification. Using paired text and images (such as radiological reports and
images) during pre-training improved the results even further. Still, most
existing methods target image classification as downstream tasks and may not be
optimal for localized tasks like semantic segmentation or object detection. We
therefore propose Localized representation learning from Vision and Text
(LoVT), to our best knowledge, the first text-supervised pre-training method
that targets localized medical imaging tasks. Our method combines
instance-level image-report contrastive learning with local contrastive
learning on image region and report sentence representations. We evaluate LoVT
and commonly used pre-training methods on a novel evaluation framework
consisting of 18 localized tasks on chest X-rays from five public datasets.
While there is no single best method, LoVT performs best on 11 out of the 18
studied tasks making it the preferred method of choice for localized tasks.
Related papers
- Nucleus-aware Self-supervised Pretraining Using Unpaired Image-to-image
Translation for Histopathology Images [3.8391355786589805]
We propose a novel nucleus-aware self-supervised pretraining framework for histopathology images.
The framework aims to capture the nuclear morphology and distribution information through unpaired image-to-image translation.
The experiments on 7 datasets show that the proposed pretraining method outperforms supervised ones on Kather classification, multiple instance learning, and 5 dense-prediction tasks.
arXiv Detail & Related papers (2023-09-14T02:31:18Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - The Role of Local Alignment and Uniformity in Image-Text Contrastive
Learning on Medical Images [7.49320945341034]
We study how local contrastive losses are related to global (per-sample) contrastive losses and which effects they have on localized medical downstream tasks.
Based on a theoretical comparison, we propose to remove some components of local losses and replace others by a novel distribution prior.
We empirically study this approach on chest X-ray tasks and find it to be very effective, outperforming methods without local losses on 12 of 18 tasks.
arXiv Detail & Related papers (2022-11-14T10:32:51Z) - Self-Supervised-RCNN for Medical Image Segmentation with Limited Data
Annotation [0.16490701092527607]
We propose an alternative deep learning training strategy based on self-supervised pretraining on unlabeled MRI scans.
Our pretraining approach first, randomly applies different distortions to random areas of unlabeled images and then predicts the type of distortions and loss of information.
The effectiveness of the proposed method for segmentation tasks in different pre-training and fine-tuning scenarios is evaluated.
arXiv Detail & Related papers (2022-07-17T13:28:52Z) - Robust Medical Image Classification from Noisy Labeled Data with Global
and Local Representation Guided Co-training [73.60883490436956]
We propose a novel collaborative training paradigm with global and local representation learning for robust medical image classification.
We employ the self-ensemble model with a noisy label filter to efficiently select the clean and noisy samples.
We also design a novel global and local representation learning scheme to implicitly regularize the networks to utilize noisy samples.
arXiv Detail & Related papers (2022-05-10T07:50:08Z) - Positional Contrastive Learning for Volumetric Medical Image
Segmentation [13.086140606803408]
We propose a novel positional contrastive learning framework to generate contrastive data pairs.
The proposed PCL method can substantially improve the segmentation performance compared to existing methods in both semi-supervised setting and transfer learning setting.
arXiv Detail & Related papers (2021-06-16T22:15:28Z) - Contrastive Learning of Medical Visual Representations from Paired
Images and Text [38.91117443316013]
We propose ConVIRT, an unsupervised strategy to learn medical visual representations by exploiting naturally occurring descriptive paired text.
Our new method of pretraining medical image encoders with the paired text data via a bidirectional contrastive objective between the two modalities is domain-agnostic, and requires no additional expert input.
arXiv Detail & Related papers (2020-10-02T02:10:18Z) - Region Comparison Network for Interpretable Few-shot Image
Classification [97.97902360117368]
Few-shot image classification has been proposed to effectively use only a limited number of labeled examples to train models for new classes.
We propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works.
We also present a new way to generalize the interpretability from the level of tasks to categories.
arXiv Detail & Related papers (2020-09-08T07:29:05Z) - Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks [207.52609682812147]
We propose a new learning method Oscar (Object-Semantics Aligned Pre-training)
It uses object tags detected in images as anchor points to significantly ease the learning of alignments.
We pre-train an Oscar model on the public corpus of 6.5 million text-image pairs, and fine-tune it on downstream tasks.
arXiv Detail & Related papers (2020-04-13T19:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.