Cyclical Self-Supervision for Semi-Supervised Ejection Fraction
Prediction from Echocardiogram Videos
- URL: http://arxiv.org/abs/2210.11291v1
- Date: Thu, 20 Oct 2022 14:23:40 GMT
- Title: Cyclical Self-Supervision for Semi-Supervised Ejection Fraction
Prediction from Echocardiogram Videos
- Authors: Weihang Dai, Xiaomeng Li, Xinpeng Ding, Kwang-Ting Cheng
- Abstract summary: Left-ventricular ejection fraction (LVEF) is an important indicator of heart failure.
Current methods for LVEF estimation from video require large amounts of annotated data to achieve high performance.
This paper presents the first semi-supervised approach for LVEF prediction.
- Score: 32.62593401120131
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Left-ventricular ejection fraction (LVEF) is an important indicator of heart
failure. Existing methods for LVEF estimation from video require large amounts
of annotated data to achieve high performance, e.g. using 10,030 labeled
echocardiogram videos to achieve mean absolute error (MAE) of 4.10. Labeling
these videos is time-consuming however and limits potential downstream
applications to other heart diseases. This paper presents the first
semi-supervised approach for LVEF prediction. Unlike general video prediction
tasks, LVEF prediction is specifically related to changes in the left ventricle
(LV) in echocardiogram videos. By incorporating knowledge learned from
predicting LV segmentations into LVEF regression, we can provide additional
context to the model for better predictions. To this end, we propose a novel
Cyclical Self-Supervision (CSS) method for learning video-based LV
segmentation, which is motivated by the observation that the heartbeat is a
cyclical process with temporal repetition. Prediction masks from our
segmentation model can then be used as additional input for LVEF regression to
provide spatial context for the LV region. We also introduce teacher-student
distillation to distill the information from LV segmentation masks into an
end-to-end LVEF regression model that only requires video inputs. Results show
our method outperforms alternative semi-supervised methods and can achieve MAE
of 4.17, which is competitive with state-of-the-art supervised performance,
using half the number of labels. Validation on an external dataset also shows
improved generalization ability from using our method.
Related papers
- EchoNarrator: Generating natural text explanations for ejection fraction predictions [1.3082208571657106]
Ejection fraction (EF) of the left ventricle (LV) is considered as one of the most important measurements for diagnosing acute heart failure.
Recent successes in deep learning research successfully estimate EF values, but the proposed models often lack an explanation for the prediction.
We propose a model that combines estimation of the LV contour over multiple frames, together with a set of modules and routines for computing various motion and shape attributes.
It then feeds the attributes into a large language model to generate text that helps to explain the network's outcome in a human-like manner.
arXiv Detail & Related papers (2024-10-31T08:59:34Z) - Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation [56.87049651707208]
Few-shot Semantic has evolved into In-context tasks, morphing into a crucial element in assessing generalist segmentation models.
Our initial focus lies in understanding how to facilitate interaction between the query image and the support image, resulting in the proposal of a KV fusion method within the self-attention framework.
Based on our analysis, we establish a simple and effective framework named DiffewS, maximally retaining the original Latent Diffusion Model's generative framework.
arXiv Detail & Related papers (2024-10-03T10:33:49Z) - Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following [74.30960564603917]
Training gaze following models requires a large number of images with gaze target coordinates annotated by human annotators.
We propose the first semi-supervised method for gaze following by introducing two novel priors to the task.
Our method outperforms simple pseudo-annotation generation baselines on the GazeFollow image dataset.
arXiv Detail & Related papers (2024-06-04T20:43:26Z) - Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models [96.97910688908956]
We introduce the first zero-shot approach for Video Semantic (VSS) based on pre-trained diffusion models.
We propose a framework tailored for VSS based on pre-trained image and video diffusion models.
Experiments show that our proposed approach outperforms existing zero-shot image semantic segmentation approaches.
arXiv Detail & Related papers (2024-05-27T08:39:38Z) - Weakly Supervised Video Individual CountingWeakly Supervised Video
Individual Counting [126.75545291243142]
Video Individual Counting aims to predict the number of unique individuals in a single video.
We introduce a weakly supervised VIC task, wherein trajectory labels are not provided.
In doing so, we devise an end-to-end trainable soft contrastive loss to drive the network to distinguish inflow, outflow, and the remaining.
arXiv Detail & Related papers (2023-12-10T16:12:13Z) - Semantic-aware Temporal Channel-wise Attention for Cardiac Function
Assessment [69.02116920364311]
Existing video-based methods do not pay much attention to the left ventricular region, nor the left ventricular changes caused by motion.
We propose a semi-supervised auxiliary learning paradigm with a left ventricular segmentation task, which contributes to the representation learning for the left ventricular region.
Our approach achieves state-of-the-art performance on the Stanford dataset with an improvement of 0.22 MAE, 0.26 RMSE, and 1.9% $R2$.
arXiv Detail & Related papers (2023-10-09T05:57:01Z) - SimLVSeg: Simplifying Left Ventricular Segmentation in 2D+Time Echocardiograms with Self- and Weakly-Supervised Learning [0.8672882547905405]
We develop SimLVSeg, a video-based network for consistent left ventricular (LV) segmentation from sparsely annotated echocardiogram videos.
SimLVSeg consists of self-supervised pre-training with temporal masking, followed by weakly supervised learning tailored for LV segmentation from sparse annotations.
We demonstrate how SimLVSeg outperforms the state-of-the-art solutions by achieving a 93.32% dice score on the largest 2D+time echocardiography dataset.
arXiv Detail & Related papers (2023-09-30T18:13:41Z) - EchoGNN: Explainable Ejection Fraction Estimation with Graph Neural
Networks [26.817931695658583]
Ejection fraction (EF) is a key indicator of cardiac function, allowing identification of patients prone to heart dysfunctions such as heart failure.
EF is estimated from cardiac ultrasound videos known as echocardiograms (echo) by manually tracing the left ventricle and estimating its volume on certain frames.
These estimations exhibit high inter-observer variability due to the manual process and varying video quality.
We introduce EchoGNN, a model based on graph neural networks (GNNs) to estimate EF from echo videos.
arXiv Detail & Related papers (2022-08-30T05:59:57Z) - Adaptive Contrast for Image Regression in Computer-Aided Disease
Assessment [22.717658723840255]
We propose the first contrastive learning framework for deep image regression, namely AdaCon.
AdaCon consists of a feature learning branch via a novel adaptive-margin contrastive loss and a regression prediction branch.
We demonstrate the effectiveness of AdaCon on two medical image regression tasks.
arXiv Detail & Related papers (2021-12-22T07:13:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.