Related papers: Arbitrary Data as Images: Fusion of Patient Data Across Modalities and Irregular Intervals with Vision Transformers

Arbitrary Data as Images: Fusion of Patient Data Across Modalities and Irregular Intervals with Vision Transformers

URL: http://arxiv.org/abs/2501.18237v1
Date: Thu, 30 Jan 2025 09:52:15 GMT
Title: Arbitrary Data as Images: Fusion of Patient Data Across Modalities and Irregular Intervals with Vision Transformers
Authors: Malte Tölle, Mohamad Scharaf, Samantha Fischer, Christoph Reich, Silav Zeid, Christoph Dieterich, Benjamin Meder, Norbert Frey, Philipp Wild, Sandy Engelhardt,
Abstract summary: Vision Transformer for irregular sampled Multi-modal Measurements (ViTiMM)<n>Our approach, Vision Transformer for irregular sampled Multi-modal Measurements (ViTiMM), not only simplifies data preprocessing and modeling but also outperforms current state-of-the-art methods in predicting in-hospital mortality and phenotyping, as evaluated on 6,175 patients from the MIMIC-IV dataset.<n>We hope our work inspires advancements in multi-modal medical AI by reducing the training complexity to (visual) prompt engineering, thus lowering entry barriers and enabling no-code solutions for training.
Score: 1.194275822303467
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A patient undergoes multiple examinations in each hospital stay, where each provides different facets of the health status. These assessments include temporal data with varying sampling rates, discrete single-point measurements, therapeutic interventions such as medication administration, and images. While physicians are able to process and integrate diverse modalities intuitively, neural networks need specific modeling for each modality complicating the training procedure. We demonstrate that this complexity can be significantly reduced by visualizing all information as images along with unstructured text and subsequently training a conventional vision-text transformer. Our approach, Vision Transformer for irregular sampled Multi-modal Measurements (ViTiMM), not only simplifies data preprocessing and modeling but also outperforms current state-of-the-art methods in predicting in-hospital mortality and phenotyping, as evaluated on 6,175 patients from the MIMIC-IV dataset. The modalities include patient's clinical measurements, medications, X-ray images, and electrocardiography scans. We hope our work inspires advancements in multi-modal medical AI by reducing the training complexity to (visual) prompt engineering, thus lowering entry barriers and enabling no-code solutions for training. The source code will be made publicly available.

Related papers

Multimodal Masked Autoencoder Pre-training for 3D MRI-Based Brain Tumor Analysis with Missing Modalities [0.0]
BM-MAE is a masked image modeling pre-training strategy tailored for multimodal MRI data. It seamlessly adapts to any combination of available modalities, extracting rich representations that capture both intra- and inter-modal information. It can quickly and efficiently reconstruct missing modalities, highlighting its practical value.
arXiv Detail & Related papers (2025-05-01T14:51:30Z)
Learning General-Purpose Biomedical Volume Representations using Randomized Synthesis [9.355513913682794]
Current biomedical foundation models struggle to generalize as public 3D datasets are small. We propose a data engine that synthesizes highly variable training samples that would enable generalization to new biomedical contexts. To then train a single 3D network for any voxel-level task, we develop a contrastive learning method that pretrains the network to be stable against nuisance imaging variation simulated by the data engine.
arXiv Detail & Related papers (2024-11-04T18:40:46Z)
Autoregressive Sequence Modeling for 3D Medical Image Representation [48.706230961589924]
We introduce a pioneering method for learning 3D medical image representations through an autoregressive sequence pre-training framework. Our approach various 3D medical images based on spatial, contrast, and semantic correlations, treating them as interconnected visual tokens within a token sequence.
arXiv Detail & Related papers (2024-09-13T10:19:10Z)
Overcoming challenges of translating deep-learning models for glioblastoma: the ZGBM consortium [0.9338156173462939]
Methods: MR data were analysed from a random sample of five patients from the prospective cohort across five participating sites of the ZGBM consortium. Reported clinical and treatment data alongside DICOM header information were analysed to understand treatment pathway imaging schedules. All sites perform all structural imaging at every stage in the pathway except for the presurgical study, where in some sites only contrast-enhanced T1-weighted imaging is performed. Diffusion MRI is the most common non-structural imaging type, performed at every site.
arXiv Detail & Related papers (2024-05-07T10:04:08Z)
HyperFusion: A Hypernetwork Approach to Multimodal Integration of Tabular and Medical Imaging Data for Predictive Modeling [4.44283662576491]
We present a novel framework based on hypernetworks to fuse clinical imaging and tabular data by conditioning the image processing on the EHR's values and measurements. We show that our framework outperforms both single-modality models and state-of-the-art MRI-tabular data fusion methods.
arXiv Detail & Related papers (2024-03-20T05:50:04Z)
Building Universal Foundation Models for Medical Image Analysis with Spatially Adaptive Networks [5.661631789478932]
We propose a universal foundation model for medical image analysis that processes images with heterogeneous spatial properties using a unified structure. We pre-train a spatial adaptive visual tokenizer (SPAD-VT) and then a spatial adaptive Vision Transformer (SPAD-ViT) via masked image modeling (MIM) on 55 public medical image datasets. The experimental results on downstream medical image classification and segmentation tasks demonstrate the superior performance and label efficiency of our model.
arXiv Detail & Related papers (2023-12-12T08:33:45Z)
Radiology Report Generation Using Transformers Conditioned with Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information. The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z)
Generalizing Medical Image Representations via Quaternion Wavelet Networks [9.836302410524842]
We introduce a novel, generalizable, data- and task-agnostic framework able to extract salient features from medical images. The proposed quaternion wavelet network (QUAVE) can be easily integrated with any pre-existing medical image analysis or synthesis task.
arXiv Detail & Related papers (2023-10-16T09:34:06Z)
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
Medical diffusion on a budget: Textual Inversion for medical image generation [3.0826983115939823]
Training from scratch requires large captioned datasets and significant computational resources. This work shows that adapting pre-trained Stable Diffusion models to medical imaging modalities is achievable by training text embeddings. The trained embeddings are compact (less than 1 MB), enabling easy data sharing with reduced privacy concerns.
arXiv Detail & Related papers (2023-03-23T16:50:19Z)
CheXstray: Real-time Multi-Modal Data Concordance for Drift Detection in Medical Imaging AI [1.359138408203412]
We build and test a medical imaging AI drift monitoring workflow that tracks data and model drift without contemporaneous ground truth. Key contributions include (1) proof-of-concept for medical imaging drift detection including use of VAE and domain specific statistical methods. This work has important implications for addressing the translation gap related to continuous medical imaging AI model monitoring in dynamic healthcare environments.
arXiv Detail & Related papers (2022-02-06T18:58:35Z)
Solving Inverse Problems in Medical Imaging with Score-Based Generative Models [87.48867245544106]
Reconstructing medical images from partial measurements is an important inverse problem in Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) Existing solutions based on machine learning typically train a model to directly map measurements to medical images. We propose a fully unsupervised technique for inverse problem solving, leveraging the recently introduced score-based generative models.
arXiv Detail & Related papers (2021-11-15T05:41:12Z)
Cross-Modal Information Maximization for Medical Imaging: CMIM [62.28852442561818]
In hospitals, data are siloed to specific information systems that make the same information available under different modalities. This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time. We propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time.
arXiv Detail & Related papers (2020-10-20T20:05:35Z)
Convolutional-LSTM for Multi-Image to Single Output Medical Prediction [55.41644538483948]
A common scenario in developing countries is to have the volume metadata lost due multiple reasons. It is possible to get a multi-image to single diagnostic model which mimics human doctor diagnostic process.
arXiv Detail & Related papers (2020-10-20T04:30:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.