GEMTrans: A General, Echocardiography-based, Multi-Level Transformer
Framework for Cardiovascular Diagnosis
- URL: http://arxiv.org/abs/2308.13217v1
- Date: Fri, 25 Aug 2023 07:30:18 GMT
- Title: GEMTrans: A General, Echocardiography-based, Multi-Level Transformer
Framework for Cardiovascular Diagnosis
- Authors: Masoud Mokhtari, Neda Ahmadi, Teresa S. M. Tsang, Purang Abolmaesumi,
Renjie Liao
- Abstract summary: Vision-based machine learning (ML) methods have gained popularity to act as secondary layers of verification.
We propose a General, Echo-based, Multi-Level Transformer (GEMTrans) framework that provides explainability.
We show the flexibility of our framework by considering two critical tasks including ejection fraction (EF) and aortic stenosis (AS) severity detection.
- Score: 14.737295160286939
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Echocardiography (echo) is an ultrasound imaging modality that is widely used
for various cardiovascular diagnosis tasks. Due to inter-observer variability
in echo-based diagnosis, which arises from the variability in echo image
acquisition and the interpretation of echo images based on clinical experience,
vision-based machine learning (ML) methods have gained popularity to act as
secondary layers of verification. For such safety-critical applications, it is
essential for any proposed ML method to present a level of explainability along
with good accuracy. In addition, such methods must be able to process several
echo videos obtained from various heart views and the interactions among them
to properly produce predictions for a variety of cardiovascular measurements or
interpretation tasks. Prior work lacks explainability or is limited in scope by
focusing on a single cardiovascular task. To remedy this, we propose a General,
Echo-based, Multi-Level Transformer (GEMTrans) framework that provides
explainability, while simultaneously enabling multi-video training where the
inter-play among echo image patches in the same frame, all frames in the same
video, and inter-video relationships are captured based on a downstream task.
We show the flexibility of our framework by considering two critical tasks
including ejection fraction (EF) and aortic stenosis (AS) severity detection.
Our model achieves mean absolute errors of 4.15 and 4.84 for single and
dual-video EF estimation and an accuracy of 96.5 % for AS detection, while
providing informative task-specific attention maps and prototypical
explainability.
Related papers
- Efficient Multi-View Fusion and Flexible Adaptation to View Missing in Cardiovascular System Signals [4.519437028632205]
Deep learning has facilitated automatic multi-view fusion (MVF) about the cardiovascular system (CVS) signals.
MVF model architecture often amalgamates CVS signals from the same temporal step but different views into a unified representation.
We introduce prompt techniques to aid pretrained MVF models in flexibly adapting to various missing-view scenarios.
arXiv Detail & Related papers (2024-06-13T08:58:59Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - VALD-MD: Visual Attribution via Latent Diffusion for Medical Diagnostics [0.0]
Visual attribution in medical imaging seeks to make evident the diagnostically-relevant components of a medical image.
We here present a novel generative visual attribution technique, one that leverages latent diffusion models in combination with domain-specific large language models.
The resulting system also exhibits a range of latent capabilities including zero-shot localized disease induction.
arXiv Detail & Related papers (2024-01-02T19:51:49Z) - Mining Gaze for Contrastive Learning toward Computer-Assisted Diagnosis [61.089776864520594]
We propose eye-tracking as an alternative to text reports for medical images.
By tracking the gaze of radiologists as they read and diagnose medical images, we can understand their visual attention and clinical reasoning.
We introduce the Medical contrastive Gaze Image Pre-training (McGIP) as a plug-and-play module for contrastive learning frameworks.
arXiv Detail & Related papers (2023-12-11T02:27:45Z) - A Unified Approach for Comprehensive Analysis of Various Spectral and
Tissue Doppler Echocardiography [3.7775754350457746]
We introduce a novel unified framework using a convolutional neural network for comprehensive analysis of spectral and tissue Doppler echocardiography images.
The network automatically recognizes key features across various Doppler views, with novel Doppler shape embedding and anti-aliasing modules.
Empirical results indicate a consistent outperformance in performance metrics, including dice similarity coefficients (DSC) and intersection over union (IoU)
arXiv Detail & Related papers (2023-11-14T15:10:05Z) - BAAF: A Benchmark Attention Adaptive Framework for Medical Ultrasound
Image Segmentation Tasks [15.998631461609968]
We propose a Benchmark Attention Adaptive Framework (BAAF) to assist doctors segment or diagnose lesions and tissues in ultrasound images.
BAAF consists of a parallel hybrid attention module (PHAM) and an adaptive calibration mechanism (ACM)
The design of BAAF further optimize the "what" and "where" focus and selection problems in CNNs and seeks to improve the segmentation accuracy of lesions or tissues in medical ultrasound images.
arXiv Detail & Related papers (2023-10-02T06:15:50Z) - Multimodal Foundation Models For Echocardiogram Interpretation [0.24578723416255746]
We leverage 1,032,975 cardiac ultrasound videos and corresponding expert interpretations to develop EchoCLIP.
EchoCLIP displays strong zero-shot (not explicitly trained) performance in cardiac function assessment.
We also developed a long-context variant (EchoCLIP-R) with a custom echocardiography report text tokenizer.
arXiv Detail & Related papers (2023-08-29T23:45:54Z) - On Sensitivity and Robustness of Normalization Schemes to Input
Distribution Shifts in Automatic MR Image Diagnosis [58.634791552376235]
Deep Learning (DL) models have achieved state-of-the-art performance in diagnosing multiple diseases using reconstructed images as input.
DL models are sensitive to varying artifacts as it leads to changes in the input data distribution between the training and testing phases.
We propose to use other normalization techniques, such as Group Normalization and Layer Normalization, to inject robustness into model performance against varying image artifacts.
arXiv Detail & Related papers (2023-06-23T03:09:03Z) - Preservation of High Frequency Content for Deep Learning-Based Medical
Image Classification [74.84221280249876]
An efficient analysis of large amounts of chest radiographs can aid physicians and radiologists.
We propose a novel Discrete Wavelet Transform (DWT)-based method for the efficient identification and encoding of visual information.
arXiv Detail & Related papers (2022-05-08T15:29:54Z) - Factored Attention and Embedding for Unstructured-view Topic-related
Ultrasound Report Generation [70.7778938191405]
We propose a novel factored attention and embedding model (termed FAE-Gen) for the unstructured-view topic-related ultrasound report generation.
The proposed FAE-Gen mainly consists of two modules, i.e., view-guided factored attention and topic-oriented factored embedding, which capture the homogeneous and heterogeneous morphological characteristic across different views.
arXiv Detail & Related papers (2022-03-12T15:24:03Z) - Collaborative Unsupervised Domain Adaptation for Medical Image Diagnosis [102.40869566439514]
We seek to exploit rich labeled data from relevant domains to help the learning in the target task via Unsupervised Domain Adaptation (UDA)
Unlike most UDA methods that rely on clean labeled data or assume samples are equally transferable, we innovatively propose a Collaborative Unsupervised Domain Adaptation algorithm.
We theoretically analyze the generalization performance of the proposed method, and also empirically evaluate it on both medical and general images.
arXiv Detail & Related papers (2020-07-05T11:49:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.