GEMTrans: A General, Echocardiography-based, Multi-Level Transformer
Framework for Cardiovascular Diagnosis
- URL: http://arxiv.org/abs/2308.13217v1
- Date: Fri, 25 Aug 2023 07:30:18 GMT
- Title: GEMTrans: A General, Echocardiography-based, Multi-Level Transformer
Framework for Cardiovascular Diagnosis
- Authors: Masoud Mokhtari, Neda Ahmadi, Teresa S. M. Tsang, Purang Abolmaesumi,
Renjie Liao
- Abstract summary: Vision-based machine learning (ML) methods have gained popularity to act as secondary layers of verification.
We propose a General, Echo-based, Multi-Level Transformer (GEMTrans) framework that provides explainability.
We show the flexibility of our framework by considering two critical tasks including ejection fraction (EF) and aortic stenosis (AS) severity detection.
- Score: 14.737295160286939
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Echocardiography (echo) is an ultrasound imaging modality that is widely used
for various cardiovascular diagnosis tasks. Due to inter-observer variability
in echo-based diagnosis, which arises from the variability in echo image
acquisition and the interpretation of echo images based on clinical experience,
vision-based machine learning (ML) methods have gained popularity to act as
secondary layers of verification. For such safety-critical applications, it is
essential for any proposed ML method to present a level of explainability along
with good accuracy. In addition, such methods must be able to process several
echo videos obtained from various heart views and the interactions among them
to properly produce predictions for a variety of cardiovascular measurements or
interpretation tasks. Prior work lacks explainability or is limited in scope by
focusing on a single cardiovascular task. To remedy this, we propose a General,
Echo-based, Multi-Level Transformer (GEMTrans) framework that provides
explainability, while simultaneously enabling multi-video training where the
inter-play among echo image patches in the same frame, all frames in the same
video, and inter-video relationships are captured based on a downstream task.
We show the flexibility of our framework by considering two critical tasks
including ejection fraction (EF) and aortic stenosis (AS) severity detection.
Our model achieves mean absolute errors of 4.15 and 4.84 for single and
dual-video EF estimation and an accuracy of 96.5 % for AS detection, while
providing informative task-specific attention maps and prototypical
explainability.
Related papers
- EchoFM: Foundation Model for Generalizable Echocardiogram Analysis [22.585990526913246]
We introduce EchoFM, a foundation model specifically designed to represent and analyze echocardiography videos.
In EchoFM, we propose a self-supervised learning framework that captures both spatial and temporal variability.
We pre-train our model on an extensive dataset comprising over 290,000 echocardiography videos, with up to 20 million frames of images.
arXiv Detail & Related papers (2024-10-30T19:32:02Z) - A Multimodal Approach For Endoscopic VCE Image Classification Using BiomedCLIP-PubMedBERT [0.62914438169038]
This Paper presents an advanced approach for fine-tuning BiomedCLIP PubMedBERT, a multimodal model, to classify abnormalities in Video Capsule Endoscopy frames.
Our method categorizes images into ten specific classes: angioectasia, bleeding, erosion, erythema, foreign body, lymphangiectasia, polyp, ulcer, worms, and normal.
Performance metrics, including classification, accuracy, recall, and F1 score, indicate the models strong ability to accurately identify abnormalities in endoscopic frames.
arXiv Detail & Related papers (2024-10-25T19:42:57Z) - EchoApex: A General-Purpose Vision Foundation Model for Echocardiography [9.202542805578432]
We introduce EchoApex, the first general-purpose vision foundation model echocardiography with applications on a variety of clinical practice.
Leveraging self-supervised learning, EchoApex is pretrained on over 20 million echo images from 11 clinical centres.
Compared to state-of-the-art task-specific models, EchoApex attains improved performance with a unified image encoding architecture.
arXiv Detail & Related papers (2024-10-14T21:10:56Z) - EchoPrime: A Multi-Video View-Informed Vision-Language Model for Comprehensive Echocardiography Interpretation [1.0840985826142429]
We introduce EchoPrime, a multi-view, view-informed, video-based vision-language foundation model trained on over 12 million video-report pairs.
With retrieval-augmented interpretation, EchoPrime integrates information from all echocardiogram videos in a comprehensive study.
In datasets from two independent healthcare systems, EchoPrime achieves state-of-the art performance on 23 diverse benchmarks of cardiac form and function.
arXiv Detail & Related papers (2024-10-13T03:04:22Z) - Efficient Multi-View Fusion and Flexible Adaptation to View Missing in Cardiovascular System Signals [4.519437028632205]
Deep learning has facilitated automatic multi-view fusion (MVF) about the cardiovascular system (CVS) signals.
MVF model architecture often amalgamates CVS signals from the same temporal step but different views into a unified representation.
We introduce prompt techniques to aid pretrained MVF models in flexibly adapting to various missing-view scenarios.
arXiv Detail & Related papers (2024-06-13T08:58:59Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - Mining Gaze for Contrastive Learning toward Computer-Assisted Diagnosis [61.089776864520594]
We propose eye-tracking as an alternative to text reports for medical images.
By tracking the gaze of radiologists as they read and diagnose medical images, we can understand their visual attention and clinical reasoning.
We introduce the Medical contrastive Gaze Image Pre-training (McGIP) as a plug-and-play module for contrastive learning frameworks.
arXiv Detail & Related papers (2023-12-11T02:27:45Z) - On Sensitivity and Robustness of Normalization Schemes to Input
Distribution Shifts in Automatic MR Image Diagnosis [58.634791552376235]
Deep Learning (DL) models have achieved state-of-the-art performance in diagnosing multiple diseases using reconstructed images as input.
DL models are sensitive to varying artifacts as it leads to changes in the input data distribution between the training and testing phases.
We propose to use other normalization techniques, such as Group Normalization and Layer Normalization, to inject robustness into model performance against varying image artifacts.
arXiv Detail & Related papers (2023-06-23T03:09:03Z) - Preservation of High Frequency Content for Deep Learning-Based Medical
Image Classification [74.84221280249876]
An efficient analysis of large amounts of chest radiographs can aid physicians and radiologists.
We propose a novel Discrete Wavelet Transform (DWT)-based method for the efficient identification and encoding of visual information.
arXiv Detail & Related papers (2022-05-08T15:29:54Z) - Factored Attention and Embedding for Unstructured-view Topic-related
Ultrasound Report Generation [70.7778938191405]
We propose a novel factored attention and embedding model (termed FAE-Gen) for the unstructured-view topic-related ultrasound report generation.
The proposed FAE-Gen mainly consists of two modules, i.e., view-guided factored attention and topic-oriented factored embedding, which capture the homogeneous and heterogeneous morphological characteristic across different views.
arXiv Detail & Related papers (2022-03-12T15:24:03Z) - Collaborative Unsupervised Domain Adaptation for Medical Image Diagnosis [102.40869566439514]
We seek to exploit rich labeled data from relevant domains to help the learning in the target task via Unsupervised Domain Adaptation (UDA)
Unlike most UDA methods that rely on clean labeled data or assume samples are equally transferable, we innovatively propose a Collaborative Unsupervised Domain Adaptation algorithm.
We theoretically analyze the generalization performance of the proposed method, and also empirically evaluate it on both medical and general images.
arXiv Detail & Related papers (2020-07-05T11:49:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.