Related papers: Multimodal Foundation Models For Echocardiogram Interpretation

Multimodal Foundation Models For Echocardiogram Interpretation

URL: http://arxiv.org/abs/2308.15670v2
Date: Sat, 2 Sep 2023 17:47:47 GMT
Title: Multimodal Foundation Models For Echocardiogram Interpretation
Authors: Matthew Christensen, Milos Vukadinovic, Neal Yuan, David Ouyang
Abstract summary: We leverage 1,032,975 cardiac ultrasound videos and corresponding expert interpretations to develop EchoCLIP. EchoCLIP displays strong zero-shot (not explicitly trained) performance in cardiac function assessment. We also developed a long-context variant (EchoCLIP-R) with a custom echocardiography report text tokenizer.
Score: 0.24578723416255746
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Multimodal deep learning foundation models can learn the relationship between images and text. In the context of medical imaging, mapping images to language concepts reflects the clinical task of diagnostic image interpretation, however current general-purpose foundation models do not perform well in this context because their training corpus have limited medical text and images. To address this challenge and account for the range of cardiac physiology, we leverage 1,032,975 cardiac ultrasound videos and corresponding expert interpretations to develop EchoCLIP, a multimodal foundation model for echocardiography. EchoCLIP displays strong zero-shot (not explicitly trained) performance in cardiac function assessment (external validation left ventricular ejection fraction mean absolute error (MAE) of 7.1%) and identification of implanted intracardiac devices (areas under the curve (AUC) between 0.84 and 0.98 for pacemakers and artificial heart valves). We also developed a long-context variant (EchoCLIP-R) with a custom echocardiography report text tokenizer which can accurately identify unique patients across multiple videos (AUC of 0.86), identify clinical changes such as orthotopic heart transplants (AUC of 0.79) or cardiac surgery (AUC 0.77), and enable robust image-to-text search (mean cross-modal retrieval rank in the top 1% of candidate text reports). These emergent capabilities can be used for preliminary assessment and summarization of echocardiographic findings.

Related papers

CACTUS: An Open Dataset and Framework for Automated Cardiac Assessment and Classification of Ultrasound Images Using Deep Transfer Learning [14.284404065445012]
The paper introduces a Deep Learning (DL) framework consisting of two main components. The first component classifies cardiac US images based on the heart view using a Convolutional Neural Network (CNN) The second component uses Transfer Learning (TL) to fine-tune the knowledge from the first component and create a model for grading and assessing cardiac images.
arXiv Detail & Related papers (2025-03-07T17:29:04Z)
Multi-Stage Segmentation and Cascade Classification Methods for Improving Cardiac MRI Analysis [15.236546465767026]
We introduce a novel deep learning-based approach to segmentation and classification of cardiac magnetic resonance images. The method improved segmentation accuracy, achieving a Dice coefficient of 0.974 for the left ventricle and 0.947 for the right ventricle. For classification, a cascade of deep learning classifiers was employed to distinguish heart conditions, including hypertrophic cardiomyopathy, myocardial infarction, and dilated cardiomyopathy.
arXiv Detail & Related papers (2024-12-12T15:53:14Z)
Integrating Deep Learning with Fundus and Optical Coherence Tomography for Cardiovascular Disease Prediction [47.7045293755736]
Early identification of patients at risk of cardiovascular diseases (CVD) is crucial for effective preventive care, reducing healthcare burden, and improving patients' quality of life. This study demonstrates the potential of retinal optical coherence tomography ( OCT) imaging combined with fundus photographs for identifying future adverse cardiac events. We propose a novel binary classification network based on a Multi-channel Variational Autoencoder (MCVAE), which learns a latent embedding of patients' fundus and OCT images to classify individuals into two groups: those likely to develop CVD in the future and those who are not.
arXiv Detail & Related papers (2024-10-18T12:37:51Z)
EchoPrime: A Multi-Video View-Informed Vision-Language Model for Comprehensive Echocardiography Interpretation [1.0840985826142429]
We introduce EchoPrime, a multi-view, view-informed, video-based vision-language foundation model trained on over 12 million video-report pairs. With retrieval-augmented interpretation, EchoPrime integrates information from all echocardiogram videos in a comprehensive study. In datasets from two independent healthcare systems, EchoPrime achieves state-of-the art performance on 23 diverse benchmarks of cardiac form and function.
arXiv Detail & Related papers (2024-10-13T03:04:22Z)
Class-Aware Cartilage Segmentation for Autonomous US-CT Registration in Robotic Intercostal Ultrasound Imaging [39.597735935731386]
A class-aware cartilage bone segmentation network with geometry-constraint post-processing is presented to capture patient-specific rib skeletons. A dense skeleton graph-based non-rigid registration is presented to map the intercostal scanning path from a generic template to individual patients. Results demonstrate that the proposed graph-based registration method can robustly and precisely map the path from CT template to individual patients.
arXiv Detail & Related papers (2024-06-06T14:15:15Z)
CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios [53.94122089629544]
We introduce CT-GLIP (Grounded Language-Image Pretraining with CT scans), a novel method that constructs organ-level image-text pairs to enhance multimodal contrastive learning. Our method, trained on a multimodal CT dataset comprising 44,011 organ-level vision-text pairs from 17,702 patients across 104 organs, demonstrates it can identify organs and abnormalities in a zero-shot manner using natural languages.
arXiv Detail & Related papers (2024-04-23T17:59:01Z)
Predicting risk of cardiovascular disease using retinal OCT imaging [40.71667870702634]
Cardiovascular diseases (CVD) are the leading cause of death globally. Optical coherence tomography ( OCT) has gained recognition as a potential tool for early CVD risk prediction. We investigated the potential of OCT as an additional imaging technique to predict future CVD events.
arXiv Detail & Related papers (2024-03-26T14:42:46Z)
Echocardiogram Foundation Model -- Application 1: Estimating Ejection Fraction [2.4164193358532438]
We introduce EchoAI, an echocardiogram foundation model, that is trained using self-supervised learning (SSL) on 1.5 million echocardiograms. We evaluate our approach by fine-tuning EchoAI to estimate the ejection fraction achieving a mean absolute percentage error of 9.40%.
arXiv Detail & Related papers (2023-11-21T13:00:03Z)
Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation [47.250147322130545]
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images. Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists. We present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.
arXiv Detail & Related papers (2023-11-18T14:37:53Z)
M(otion)-mode Based Prediction of Ejection Fraction using Echocardiograms [13.112371567924802]
We propose using the M(otion)-mode of echocardiograms for estimating the left ventricular ejection fraction (EF) and classifying cardiomyopathy. We generate multiple artificial M-mode images from a single echocardiogram and combine them using off-the-shelf model architectures. Our experiments show that the supervised setting converges with only ten modes and is comparable to the baseline method.
arXiv Detail & Related papers (2023-09-07T15:00:58Z)
Multi-scale, Data-driven and Anatomically Constrained Deep Learning Image Registration for Adult and Fetal Echocardiography [4.923733944174007]
We propose a framework that combines three strategies for deep learning image registration in both fetal and adult echo. Our tests show that good anatomical topology and image textures are strongly linked to shape-encoded and data-driven adversarial losses. Our approach outperforms traditional non-DL gold standard registration approaches, including Optical Flow and Elastix.
arXiv Detail & Related papers (2023-09-02T05:33:31Z)
GEMTrans: A General, Echocardiography-based, Multi-Level Transformer Framework for Cardiovascular Diagnosis [14.737295160286939]
Vision-based machine learning (ML) methods have gained popularity to act as secondary layers of verification. We propose a General, Echo-based, Multi-Level Transformer (GEMTrans) framework that provides explainability. We show the flexibility of our framework by considering two critical tasks including ejection fraction (EF) and aortic stenosis (AS) severity detection.
arXiv Detail & Related papers (2023-08-25T07:30:18Z)
Self-supervised contrastive learning of echocardiogram videos enables label-efficient cardiac disease diagnosis [48.64462717254158]
We developed a self-supervised contrastive learning approach, EchoCLR, to catered to echocardiogram videos. When fine-tuned on small portions of labeled data, EchoCLR pretraining significantly improved classification performance for left ventricular hypertrophy (LVH) and aortic stenosis (AS) EchoCLR is unique in its ability to learn representations of medical videos and demonstrates that SSL can enable label-efficient disease classification from small, labeled datasets.
arXiv Detail & Related papers (2022-07-23T19:17:26Z)
Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns. ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z)
Co-Heterogeneous and Adaptive Segmentation from Multi-Source and Multi-Phase CT Imaging Data: A Study on Pathological Liver and Lesion Segmentation [48.504790189796836]
We present a novel segmentation strategy, co-heterogenous and adaptive segmentation (CHASe) We propose a versatile framework that fuses appearance based semi-supervision, mask based adversarial domain adaptation, and pseudo-labeling. CHASe can further improve pathological liver mask Dice-Sorensen coefficients by ranges of $4.2% sim 9.4%$.
arXiv Detail & Related papers (2020-05-27T06:58:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.