Best of Both Worlds: Multimodal Contrastive Learning with Tabular and
Imaging Data
- URL: http://arxiv.org/abs/2303.14080v3
- Date: Thu, 30 Mar 2023 12:40:35 GMT
- Title: Best of Both Worlds: Multimodal Contrastive Learning with Tabular and
Imaging Data
- Authors: Paul Hager, Martin J. Menten, Daniel Rueckert
- Abstract summary: We propose the first self-supervised contrastive learning framework to train unimodal encoders.
Our solution combines SimCLR and SCARF, two leading contrastive learning strategies.
We show the generalizability of our approach to natural images using the DVM car advertisement dataset.
- Score: 7.49320945341034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Medical datasets and especially biobanks, often contain extensive tabular
data with rich clinical information in addition to images. In practice,
clinicians typically have less data, both in terms of diversity and scale, but
still wish to deploy deep learning solutions. Combined with increasing medical
dataset sizes and expensive annotation costs, the necessity for unsupervised
methods that can pretrain multimodally and predict unimodally has risen.
To address these needs, we propose the first self-supervised contrastive
learning framework that takes advantage of images and tabular data to train
unimodal encoders. Our solution combines SimCLR and SCARF, two leading
contrastive learning strategies, and is simple and effective. In our
experiments, we demonstrate the strength of our framework by predicting risks
of myocardial infarction and coronary artery disease (CAD) using cardiac MR
images and 120 clinical features from 40,000 UK Biobank subjects. Furthermore,
we show the generalizability of our approach to natural images using the DVM
car advertisement dataset.
We take advantage of the high interpretability of tabular data and through
attribution and ablation experiments find that morphometric tabular features,
describing size and shape, have outsized importance during the contrastive
learning process and improve the quality of the learned embeddings. Finally, we
introduce a novel form of supervised contrastive learning, label as a feature
(LaaF), by appending the ground truth label as a tabular feature during
multimodal pretraining, outperforming all supervised contrastive baselines.
Related papers
- Barttender: An approachable & interpretable way to compare medical imaging and non-imaging data [0.13406576408866772]
Barttender is an interpretable framework that uses deep learning for the comparison of the utility of imaging versus non-imaging data for tasks like disease prediction.
Our framework allows researchers to evaluate differences in utility through performance measures, as well as local (sample-level) and global (population-level) explanations.
arXiv Detail & Related papers (2024-11-19T18:22:25Z) - Predicting Stroke through Retinal Graphs and Multimodal Self-supervised Learning [0.46835339362676565]
Early identification of stroke is crucial for intervention, requiring reliable models.
We proposed an efficient retinal image representation together with clinical information to capture a comprehensive overview of cardiovascular health.
arXiv Detail & Related papers (2024-11-08T14:40:56Z) - LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model [55.80651780294357]
State-of-the-art medical multi-modal large language models (med-MLLM) leverage instruction-following data in pre-training.
LoGra-Med is a new multi-graph alignment algorithm that enforces triplet correlations across image modalities, conversation-based descriptions, and extended captions.
Our results show LoGra-Med matches LLAVA-Med performance on 600K image-text pairs for Medical VQA and significantly outperforms it when trained on 10% of the data.
arXiv Detail & Related papers (2024-10-03T15:52:03Z) - ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports.
Based on this dataset, we focus on the challanging task of unsupervised pretraining.
We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z) - Contrast Everything: A Hierarchical Contrastive Framework for Medical
Time-Series [12.469204999759965]
We present COMET, an innovative hierarchical framework that leverages data consistencies at all inherent levels in medical time series.
Our meticulously designed model systematically captures data consistency from four potential levels: observation, sample, trial, and patient levels.
We compare COMET against six baselines using three diverse datasets, which include ECG signals for myocardial infarction and EEG signals for Alzheimer's and Parkinson's diseases.
arXiv Detail & Related papers (2023-10-21T13:59:31Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context
Processing for Representation Learning of Giga-pixel Images [53.29794593104923]
We present a novel concept of shared-context processing for whole slide histopathology images.
AMIGO uses the celluar graph within the tissue to provide a single representation for a patient.
We show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data.
arXiv Detail & Related papers (2023-03-01T23:37:45Z) - Metadata-enhanced contrastive learning from retinal optical coherence tomography images [7.932410831191909]
We extend conventional contrastive frameworks with a novel metadata-enhanced strategy.
Our approach employs widely available patient metadata to approximate the true set of inter-image contrastive relationships.
Our approach outperforms both standard contrastive methods and a retinal image foundation model in five out of six image-level downstream tasks.
arXiv Detail & Related papers (2022-08-04T08:53:15Z) - DSAL: Deeply Supervised Active Learning from Strong and Weak Labelers
for Biomedical Image Segmentation [13.707848142719424]
We propose a deep active semi-supervised learning framework, DSAL, combining active learning and semi-supervised learning strategies.
In DSAL, a new criterion based on deep supervision mechanism is proposed to select informative samples with high uncertainties.
We use the proposed criteria to select samples for strong and weak labelers to produce oracle labels and pseudo labels simultaneously at each active learning iteration.
arXiv Detail & Related papers (2021-01-22T11:31:33Z) - Cross-Modal Information Maximization for Medical Imaging: CMIM [62.28852442561818]
In hospitals, data are siloed to specific information systems that make the same information available under different modalities.
This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time.
We propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time.
arXiv Detail & Related papers (2020-10-20T20:05:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.