Related papers: Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data

Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data

URL: http://arxiv.org/abs/2303.14080v3
Date: Thu, 30 Mar 2023 12:40:35 GMT
Title: Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data
Authors: Paul Hager, Martin J. Menten, Daniel Rueckert
Abstract summary: We propose the first self-supervised contrastive learning framework to train unimodal encoders. Our solution combines SimCLR and SCARF, two leading contrastive learning strategies. We show the generalizability of our approach to natural images using the DVM car advertisement dataset.
Score: 7.49320945341034
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Medical datasets and especially biobanks, often contain extensive tabular data with rich clinical information in addition to images. In practice, clinicians typically have less data, both in terms of diversity and scale, but still wish to deploy deep learning solutions. Combined with increasing medical dataset sizes and expensive annotation costs, the necessity for unsupervised methods that can pretrain multimodally and predict unimodally has risen. To address these needs, we propose the first self-supervised contrastive learning framework that takes advantage of images and tabular data to train unimodal encoders. Our solution combines SimCLR and SCARF, two leading contrastive learning strategies, and is simple and effective. In our experiments, we demonstrate the strength of our framework by predicting risks of myocardial infarction and coronary artery disease (CAD) using cardiac MR images and 120 clinical features from 40,000 UK Biobank subjects. Furthermore, we show the generalizability of our approach to natural images using the DVM car advertisement dataset. We take advantage of the high interpretability of tabular data and through attribution and ablation experiments find that morphometric tabular features, describing size and shape, have outsized importance during the contrastive learning process and improve the quality of the learned embeddings. Finally, we introduce a novel form of supervised contrastive learning, label as a feature (LaaF), by appending the ground truth label as a tabular feature during multimodal pretraining, outperforming all supervised contrastive baselines.

Related papers

TGV: Tabular Data-Guided Learning of Visual Cardiac Representations [0.0]
In medical imaging, we often seek to compare entire patients with different phenotypes rather than just multiple augmentations of one scan. We propose harnessing clinically relevant tabular data to identify distinct patient phenotypes and form more meaningful pairs. We demonstrate its strength using short-axis cardiac MR images and clinical attributes from the UK Biobank.
arXiv Detail & Related papers (2025-03-19T08:49:55Z)
Barttender: An approachable & interpretable way to compare medical imaging and non-imaging data [0.13406576408866772]
Barttender is an interpretable framework that uses deep learning for the comparison of the utility of imaging versus non-imaging data for tasks like disease prediction. Our framework allows researchers to evaluate differences in utility through performance measures, as well as local (sample-level) and global (population-level) explanations.
arXiv Detail & Related papers (2024-11-19T18:22:25Z)
Predicting Stroke through Retinal Graphs and Multimodal Self-supervised Learning [0.46835339362676565]
Early identification of stroke is crucial for intervention, requiring reliable models. We proposed an efficient retinal image representation together with clinical information to capture a comprehensive overview of cardiovascular health.
arXiv Detail & Related papers (2024-11-08T14:40:56Z)
LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model [55.80651780294357]
State-of-the-art medical multi-modal large language models (med-MLLM) leverage instruction-following data in pre-training. LoGra-Med is a new multi-graph alignment algorithm that enforces triplet correlations across image modalities, conversation-based descriptions, and extended captions. Our results show LoGra-Med matches LLAVA-Med performance on 600K image-text pairs for Medical VQA and significantly outperforms it when trained on 10% of the data.
arXiv Detail & Related papers (2024-10-03T15:52:03Z)
ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports. Based on this dataset, we focus on the challanging task of unsupervised pretraining. We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z)
Contrast Everything: A Hierarchical Contrastive Framework for Medical Time-Series [12.469204999759965]
We present COMET, an innovative hierarchical framework that leverages data consistencies at all inherent levels in medical time series. Our meticulously designed model systematically captures data consistency from four potential levels: observation, sample, trial, and patient levels. We compare COMET against six baselines using three diverse datasets, which include ECG signals for myocardial infarction and EEG signals for Alzheimer's and Parkinson's diseases.
arXiv Detail & Related papers (2023-10-21T13:59:31Z)
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
Domain Generalization for Mammographic Image Analysis with Contrastive Learning [62.25104935889111]
The training of an efficacious deep learning model requires large data with diverse styles and qualities. A novel contrastive learning is developed to equip the deep learning models with better style generalization capability. The proposed method has been evaluated extensively and rigorously with mammograms from various vendor style domains and several public datasets.
arXiv Detail & Related papers (2023-04-20T11:40:21Z)
Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space. We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains. Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z)
AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images [53.29794593104923]
We present a novel concept of shared-context processing for whole slide histopathology images. AMIGO uses the celluar graph within the tissue to provide a single representation for a patient. We show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data.
arXiv Detail & Related papers (2023-03-01T23:37:45Z)
Metadata-enhanced contrastive learning from retinal optical coherence tomography images [7.932410831191909]
We extend conventional contrastive frameworks with a novel metadata-enhanced strategy. Our approach employs widely available patient metadata to approximate the true set of inter-image contrastive relationships. Our approach outperforms both standard contrastive methods and a retinal image foundation model in five out of six image-level downstream tasks.
arXiv Detail & Related papers (2022-08-04T08:53:15Z)
DSAL: Deeply Supervised Active Learning from Strong and Weak Labelers for Biomedical Image Segmentation [13.707848142719424]
We propose a deep active semi-supervised learning framework, DSAL, combining active learning and semi-supervised learning strategies. In DSAL, a new criterion based on deep supervision mechanism is proposed to select informative samples with high uncertainties. We use the proposed criteria to select samples for strong and weak labelers to produce oracle labels and pseudo labels simultaneously at each active learning iteration.
arXiv Detail & Related papers (2021-01-22T11:31:33Z)
Cross-Modal Information Maximization for Medical Imaging: CMIM [62.28852442561818]
In hospitals, data are siloed to specific information systems that make the same information available under different modalities. This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time. We propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time.
arXiv Detail & Related papers (2020-10-20T20:05:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.