Multimodal Foundation Models For Echocardiogram Interpretation
- URL: http://arxiv.org/abs/2308.15670v2
- Date: Sat, 2 Sep 2023 17:47:47 GMT
- Title: Multimodal Foundation Models For Echocardiogram Interpretation
- Authors: Matthew Christensen, Milos Vukadinovic, Neal Yuan, David Ouyang
- Abstract summary: We leverage 1,032,975 cardiac ultrasound videos and corresponding expert interpretations to develop EchoCLIP.
EchoCLIP displays strong zero-shot (not explicitly trained) performance in cardiac function assessment.
We also developed a long-context variant (EchoCLIP-R) with a custom echocardiography report text tokenizer.
- Score: 0.24578723416255746
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Multimodal deep learning foundation models can learn the relationship between
images and text. In the context of medical imaging, mapping images to language
concepts reflects the clinical task of diagnostic image interpretation, however
current general-purpose foundation models do not perform well in this context
because their training corpus have limited medical text and images. To address
this challenge and account for the range of cardiac physiology, we leverage
1,032,975 cardiac ultrasound videos and corresponding expert interpretations to
develop EchoCLIP, a multimodal foundation model for echocardiography. EchoCLIP
displays strong zero-shot (not explicitly trained) performance in cardiac
function assessment (external validation left ventricular ejection fraction
mean absolute error (MAE) of 7.1%) and identification of implanted intracardiac
devices (areas under the curve (AUC) between 0.84 and 0.98 for pacemakers and
artificial heart valves). We also developed a long-context variant (EchoCLIP-R)
with a custom echocardiography report text tokenizer which can accurately
identify unique patients across multiple videos (AUC of 0.86), identify
clinical changes such as orthotopic heart transplants (AUC of 0.79) or cardiac
surgery (AUC 0.77), and enable robust image-to-text search (mean cross-modal
retrieval rank in the top 1% of candidate text reports). These emergent
capabilities can be used for preliminary assessment and summarization of
echocardiographic findings.
Related papers
- Integrating Deep Learning with Fundus and Optical Coherence Tomography for Cardiovascular Disease Prediction [47.7045293755736]
Early identification of patients at risk of cardiovascular diseases (CVD) is crucial for effective preventive care, reducing healthcare burden, and improving patients' quality of life.
This study demonstrates the potential of retinal optical coherence tomography ( OCT) imaging combined with fundus photographs for identifying future adverse cardiac events.
We propose a novel binary classification network based on a Multi-channel Variational Autoencoder (MCVAE), which learns a latent embedding of patients' fundus and OCT images to classify individuals into two groups: those likely to develop CVD in the future and those who are not.
arXiv Detail & Related papers (2024-10-18T12:37:51Z) - EchoPrime: A Multi-Video View-Informed Vision-Language Model for Comprehensive Echocardiography Interpretation [1.0840985826142429]
We introduce EchoPrime, a multi-view, view-informed, video-based vision-language foundation model trained on over 12 million video-report pairs.
With retrieval-augmented interpretation, EchoPrime integrates information from all echocardiogram videos in a comprehensive study.
In datasets from two independent healthcare systems, EchoPrime achieves state-of-the art performance on 23 diverse benchmarks of cardiac form and function.
arXiv Detail & Related papers (2024-10-13T03:04:22Z) - CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios [53.94122089629544]
We introduce CT-GLIP (Grounded Language-Image Pretraining with CT scans), a novel method that constructs organ-level image-text pairs to enhance multimodal contrastive learning.
Our method, trained on a multimodal CT dataset comprising 44,011 organ-level vision-text pairs from 17,702 patients across 104 organs, demonstrates it can identify organs and abnormalities in a zero-shot manner using natural languages.
arXiv Detail & Related papers (2024-04-23T17:59:01Z) - Echocardiogram Foundation Model -- Application 1: Estimating Ejection
Fraction [2.4164193358532438]
We introduce EchoAI, an echocardiogram foundation model, that is trained using self-supervised learning (SSL) on 1.5 million echocardiograms.
We evaluate our approach by fine-tuning EchoAI to estimate the ejection fraction achieving a mean absolute percentage error of 9.40%.
arXiv Detail & Related papers (2023-11-21T13:00:03Z) - Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report
Generation [47.250147322130545]
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images.
Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists.
We present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.
arXiv Detail & Related papers (2023-11-18T14:37:53Z) - M(otion)-mode Based Prediction of Ejection Fraction using
Echocardiograms [13.112371567924802]
We propose using the M(otion)-mode of echocardiograms for estimating the left ventricular ejection fraction (EF) and classifying cardiomyopathy.
We generate multiple artificial M-mode images from a single echocardiogram and combine them using off-the-shelf model architectures.
Our experiments show that the supervised setting converges with only ten modes and is comparable to the baseline method.
arXiv Detail & Related papers (2023-09-07T15:00:58Z) - Multi-scale, Data-driven and Anatomically Constrained Deep Learning
Image Registration for Adult and Fetal Echocardiography [4.923733944174007]
We propose a framework that combines three strategies for deep learning image registration in both fetal and adult echo.
Our tests show that good anatomical topology and image textures are strongly linked to shape-encoded and data-driven adversarial losses.
Our approach outperforms traditional non-DL gold standard registration approaches, including Optical Flow and Elastix.
arXiv Detail & Related papers (2023-09-02T05:33:31Z) - GEMTrans: A General, Echocardiography-based, Multi-Level Transformer
Framework for Cardiovascular Diagnosis [14.737295160286939]
Vision-based machine learning (ML) methods have gained popularity to act as secondary layers of verification.
We propose a General, Echo-based, Multi-Level Transformer (GEMTrans) framework that provides explainability.
We show the flexibility of our framework by considering two critical tasks including ejection fraction (EF) and aortic stenosis (AS) severity detection.
arXiv Detail & Related papers (2023-08-25T07:30:18Z) - Self-supervised contrastive learning of echocardiogram videos enables
label-efficient cardiac disease diagnosis [48.64462717254158]
We developed a self-supervised contrastive learning approach, EchoCLR, to catered to echocardiogram videos.
When fine-tuned on small portions of labeled data, EchoCLR pretraining significantly improved classification performance for left ventricular hypertrophy (LVH) and aortic stenosis (AS)
EchoCLR is unique in its ability to learn representations of medical videos and demonstrates that SSL can enable label-efficient disease classification from small, labeled datasets.
arXiv Detail & Related papers (2022-07-23T19:17:26Z) - Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report
Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns.
ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z) - Co-Heterogeneous and Adaptive Segmentation from Multi-Source and
Multi-Phase CT Imaging Data: A Study on Pathological Liver and Lesion
Segmentation [48.504790189796836]
We present a novel segmentation strategy, co-heterogenous and adaptive segmentation (CHASe)
We propose a versatile framework that fuses appearance based semi-supervision, mask based adversarial domain adaptation, and pseudo-labeling.
CHASe can further improve pathological liver mask Dice-Sorensen coefficients by ranges of $4.2% sim 9.4%$.
arXiv Detail & Related papers (2020-05-27T06:58:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.