Tables Guide Vision: Learning to See the Heart through Tabular Data
- URL: http://arxiv.org/abs/2503.14998v2
- Date: Mon, 06 Oct 2025 20:19:20 GMT
- Title: Tables Guide Vision: Learning to See the Heart through Tabular Data
- Authors: Marta Hasny, Maxime Di Folco, Keno Bressem, Julia Schnabel,
- Abstract summary: Methods in computer vision typically rely on augmented views of the same image or multimodal pretraining strategies that align paired modalities.<n>This is especially critical in medical imaging domains such as cardiology, where demographic and clinical attributes play a critical role in assessing disease risk and patient outcomes.
- Score: 0.43748379918040853
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Contrastive learning methods in computer vision typically rely on augmented views of the same image or multimodal pretraining strategies that align paired modalities. However, these approaches often overlook semantic relationships between distinct instances, leading to false negatives when semantically similar samples are treated as negatives. This limitation is especially critical in medical imaging domains such as cardiology, where demographic and clinical attributes play a critical role in assessing disease risk and patient outcomes. We introduce a tabular-guided contrastive learning framework that leverages clinically relevant tabular data to identify patient-level similarities and construct more meaningful pairs, enabling semantically aligned representation learning without requiring joint embeddings across modalities. Additionally, we adapt the k-NN algorithm for zero-shot prediction to overcome the lack of zero-shot capability in unimodal representations. We demonstrate the strength of our methods using a large cohort of short-axis cardiac MR images and clinical attributes, where tabular data helps to more effectively distinguish between patient subgroups. Evaluation on downstream tasks, including fine-tuning, linear probing, and zero-shot prediction of cardiovascular artery diseases and cardiac phenotypes, shows that incorporating tabular data guidance yields stronger visual representations than conventional methods that rely solely on image augmentation or combined image-tabular embeddings. Further, we show that our method can generalize to natural images by evaluating it on a car advertisement dataset. The code will be available on GitHub upon acceptance.
Related papers
- Barttender: An approachable & interpretable way to compare medical imaging and non-imaging data [0.13406576408866772]
Barttender is an interpretable framework that uses deep learning for the comparison of the utility of imaging versus non-imaging data for tasks like disease prediction.
Our framework allows researchers to evaluate differences in utility through performance measures, as well as local (sample-level) and global (population-level) explanations.
arXiv Detail & Related papers (2024-11-19T18:22:25Z) - Predicting Stroke through Retinal Graphs and Multimodal Self-supervised Learning [0.46835339362676565]
Early identification of stroke is crucial for intervention, requiring reliable models.
We proposed an efficient retinal image representation together with clinical information to capture a comprehensive overview of cardiovascular health.
arXiv Detail & Related papers (2024-11-08T14:40:56Z) - Autoregressive Sequence Modeling for 3D Medical Image Representation [48.706230961589924]
We introduce a pioneering method for learning 3D medical image representations through an autoregressive sequence pre-training framework.
Our approach various 3D medical images based on spatial, contrast, and semantic correlations, treating them as interconnected visual tokens within a token sequence.
arXiv Detail & Related papers (2024-09-13T10:19:10Z) - Contrastive Learning with Counterfactual Explanations for Radiology Report Generation [83.30609465252441]
We propose a textbfCountertextbfFactual textbfExplanations-based framework (CoFE) for radiology report generation.
Counterfactual explanations serve as a potent tool for understanding how decisions made by algorithms can be changed by asking what if'' scenarios.
Experiments on two benchmarks demonstrate that leveraging the counterfactual explanations enables CoFE to generate semantically coherent and factually complete reports.
arXiv Detail & Related papers (2024-07-19T17:24:25Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z) - Multimodal brain age estimation using interpretable adaptive
population-graph learning [58.99653132076496]
We propose a framework that learns a population graph structure optimized for the downstream task.
An attention mechanism assigns weights to a set of imaging and non-imaging features.
By visualizing the attention weights that were the most important for the graph construction, we increase the interpretability of the graph.
arXiv Detail & Related papers (2023-07-10T15:35:31Z) - Best of Both Worlds: Multimodal Contrastive Learning with Tabular and
Imaging Data [7.49320945341034]
We propose the first self-supervised contrastive learning framework to train unimodal encoders.
Our solution combines SimCLR and SCARF, two leading contrastive learning strategies.
We show the generalizability of our approach to natural images using the DVM car advertisement dataset.
arXiv Detail & Related papers (2023-03-24T15:44:42Z) - Rethinking Semi-Supervised Medical Image Segmentation: A
Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation.
We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks.
We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z) - Lesion-based Contrastive Learning for Diabetic Retinopathy Grading from
Fundus Images [2.498907460918493]
We propose a self-supervised framework, namely lesion-based contrastive learning for automated diabetic retinopathy grading.
Our proposed framework performs outstandingly on DR grading in terms of both linear evaluation and transfer capacity evaluation.
arXiv Detail & Related papers (2021-07-17T16:30:30Z) - MedAug: Contrastive learning leveraging patient metadata improves
representations for chest X-ray interpretation [8.403653472706822]
We develop a method to select positive pairs coming from views of possibly different images through the use of patient metadata.
We compare strategies for selecting positive pairs for chest X-ray interpretation including requiring them to be from the same patient, imaging study or laterality.
Our best performing positive pair selection strategy, which involves using images from the same patient from the same study across all lateralities, achieves a performance increase of 3.4% and 14.4% in mean AUC.
arXiv Detail & Related papers (2021-02-21T18:39:04Z) - Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning.
It aims to extract both the common information and the complementary information in an adversarial setting.
In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z) - Context Matters: Graph-based Self-supervised Representation Learning for
Medical Images [21.23065972218941]
We introduce a novel approach with two levels of self-supervised representation learning objectives.
We use graph neural networks to incorporate the relationship between different anatomical regions.
Our model can identify clinically relevant regions in the images.
arXiv Detail & Related papers (2020-12-11T16:26:07Z) - Proactive Pseudo-Intervention: Causally Informed Contrastive Learning
For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI)
PPI leverages proactive interventions to guard against image features with no causal relevance.
We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z) - Dynamic Graph Correlation Learning for Disease Diagnosis with Incomplete
Labels [66.57101219176275]
Disease diagnosis on chest X-ray images is a challenging multi-label classification task.
We propose a Disease Diagnosis Graph Convolutional Network (DD-GCN) that presents a novel view of investigating the inter-dependency among different diseases.
Our method is the first to build a graph over the feature maps with a dynamic adjacency matrix for correlation learning.
arXiv Detail & Related papers (2020-02-26T17:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.