Arbitrary Data as Images: Fusion of Patient Data Across Modalities and Irregular Intervals with Vision Transformers
- URL: http://arxiv.org/abs/2501.18237v1
- Date: Thu, 30 Jan 2025 09:52:15 GMT
- Title: Arbitrary Data as Images: Fusion of Patient Data Across Modalities and Irregular Intervals with Vision Transformers
- Authors: Malte Tölle, Mohamad Scharaf, Samantha Fischer, Christoph Reich, Silav Zeid, Christoph Dieterich, Benjamin Meder, Norbert Frey, Philipp Wild, Sandy Engelhardt,
- Abstract summary: Vision Transformer for irregular sampled Multi-modal Measurements (ViTiMM)
Our approach, Vision Transformer for irregular sampled Multi-modal Measurements (ViTiMM), not only simplifies data preprocessing and modeling but also outperforms current state-of-the-art methods in predicting in-hospital mortality and phenotyping, as evaluated on 6,175 patients from the MIMIC-IV dataset.
We hope our work inspires advancements in multi-modal medical AI by reducing the training complexity to (visual) prompt engineering, thus lowering entry barriers and enabling no-code solutions for training.
- Score: 1.194275822303467
- License:
- Abstract: A patient undergoes multiple examinations in each hospital stay, where each provides different facets of the health status. These assessments include temporal data with varying sampling rates, discrete single-point measurements, therapeutic interventions such as medication administration, and images. While physicians are able to process and integrate diverse modalities intuitively, neural networks need specific modeling for each modality complicating the training procedure. We demonstrate that this complexity can be significantly reduced by visualizing all information as images along with unstructured text and subsequently training a conventional vision-text transformer. Our approach, Vision Transformer for irregular sampled Multi-modal Measurements (ViTiMM), not only simplifies data preprocessing and modeling but also outperforms current state-of-the-art methods in predicting in-hospital mortality and phenotyping, as evaluated on 6,175 patients from the MIMIC-IV dataset. The modalities include patient's clinical measurements, medications, X-ray images, and electrocardiography scans. We hope our work inspires advancements in multi-modal medical AI by reducing the training complexity to (visual) prompt engineering, thus lowering entry barriers and enabling no-code solutions for training. The source code will be made publicly available.
Related papers
- Learning General-Purpose Biomedical Volume Representations using Randomized Synthesis [9.355513913682794]
Current biomedical foundation models struggle to generalize as public 3D datasets are small.
We propose a data engine that synthesizes highly variable training samples that enable generalization to new biomedical contexts.
To then train a single 3D network for any voxel-level task, we develop a contrastive learning method that pretrains the network to be stable against nuisance imaging variation simulated by the data engine.
arXiv Detail & Related papers (2024-11-04T18:40:46Z) - Autoregressive Sequence Modeling for 3D Medical Image Representation [48.706230961589924]
We introduce a pioneering method for learning 3D medical image representations through an autoregressive sequence pre-training framework.
Our approach various 3D medical images based on spatial, contrast, and semantic correlations, treating them as interconnected visual tokens within a token sequence.
arXiv Detail & Related papers (2024-09-13T10:19:10Z) - Overcoming challenges of translating deep-learning models for glioblastoma: the ZGBM consortium [0.9338156173462939]
Methods: MR data were analysed from a random sample of five patients from the prospective cohort across five participating sites of the ZGBM consortium.
Reported clinical and treatment data alongside DICOM header information were analysed to understand treatment pathway imaging schedules.
All sites perform all structural imaging at every stage in the pathway except for the presurgical study, where in some sites only contrast-enhanced T1-weighted imaging is performed.
Diffusion MRI is the most common non-structural imaging type, performed at every site.
arXiv Detail & Related papers (2024-05-07T10:04:08Z) - VISION: Toward a Standardized Process for Radiology Image Management at the National Level [3.793492459789475]
We describe our experiences in establishing a trusted collection of radiology images linked to the United States Department of Veterans Affairs (VA) electronic health record database.
Key insights include uncovering the specific procedures required for transferring images from a clinical to a research-ready environment.
arXiv Detail & Related papers (2024-04-29T16:30:24Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - CheXstray: Real-time Multi-Modal Data Concordance for Drift Detection in
Medical Imaging AI [1.359138408203412]
We build and test a medical imaging AI drift monitoring workflow that tracks data and model drift without contemporaneous ground truth.
Key contributions include (1) proof-of-concept for medical imaging drift detection including use of VAE and domain specific statistical methods.
This work has important implications for addressing the translation gap related to continuous medical imaging AI model monitoring in dynamic healthcare environments.
arXiv Detail & Related papers (2022-02-06T18:58:35Z) - Solving Inverse Problems in Medical Imaging with Score-Based Generative
Models [87.48867245544106]
Reconstructing medical images from partial measurements is an important inverse problem in Computed Tomography (CT) and Magnetic Resonance Imaging (MRI)
Existing solutions based on machine learning typically train a model to directly map measurements to medical images.
We propose a fully unsupervised technique for inverse problem solving, leveraging the recently introduced score-based generative models.
arXiv Detail & Related papers (2021-11-15T05:41:12Z) - Cross-Modal Information Maximization for Medical Imaging: CMIM [62.28852442561818]
In hospitals, data are siloed to specific information systems that make the same information available under different modalities.
This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time.
We propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time.
arXiv Detail & Related papers (2020-10-20T20:05:35Z) - Convolutional-LSTM for Multi-Image to Single Output Medical Prediction [55.41644538483948]
A common scenario in developing countries is to have the volume metadata lost due multiple reasons.
It is possible to get a multi-image to single diagnostic model which mimics human doctor diagnostic process.
arXiv Detail & Related papers (2020-10-20T04:30:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.