TPA: Temporal Prompt Alignment for Fetal Congenital Heart Defect Classification
- URL: http://arxiv.org/abs/2508.15298v5
- Date: Fri, 05 Sep 2025 15:35:54 GMT
- Title: TPA: Temporal Prompt Alignment for Fetal Congenital Heart Defect Classification
- Authors: Darya Taratynova, Alya Almsouti, Beknur Kalmakhanbet, Numan Saeed, Mohammad Yaqub,
- Abstract summary: Congenital heart defect (CHD) detection in ultrasound videos is hindered by image noise and probe positioning variability.<n>We propose Temporal Prompt Alignment (TPA), a method leveraging foundation image-text model and prompt-aware contrastive learning.<n>TPA extracts features from each frame of video subclips using an image encoder, aggregates them with a trainable temporal extractor, and aligns the video representation with class-specific text prompts.
- Score: 2.3974223785103166
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Congenital heart defect (CHD) detection in ultrasound videos is hindered by image noise and probe positioning variability. While automated methods can reduce operator dependence, current machine learning approaches often neglect temporal information, limit themselves to binary classification, and do not account for prediction calibration. We propose Temporal Prompt Alignment (TPA), a method leveraging foundation image-text model and prompt-aware contrastive learning to classify fetal CHD on cardiac ultrasound videos. TPA extracts features from each frame of video subclips using an image encoder, aggregates them with a trainable temporal extractor to capture heart motion, and aligns the video representation with class-specific text prompts via a margin-hinge contrastive loss. To enhance calibration for clinical reliability, we introduce a Conditional Variational Autoencoder Style Modulation (CVAESM) module, which learns a latent style vector to modulate embeddings and quantifies classification uncertainty. Evaluated on a private dataset for CHD detection and on a large public dataset, EchoNet-Dynamic, for systolic dysfunction, TPA achieves state-of-the-art macro F1 scores of 85.40% for CHD diagnosis, while also reducing expected calibration error by 5.38% and adaptive ECE by 6.8%. On EchoNet-Dynamic's three-class task, it boosts macro F1 by 4.73% (from 53.89% to 58.62%). Temporal Prompt Alignment (TPA) is a framework for fetal congenital heart defect (CHD) classification in ultrasound videos that integrates temporal modeling, prompt-aware contrastive learning, and uncertainty quantification.
Related papers
- A Hybrid Deep Learning Model for Robust Biometric Authentication from Low-Frame-Rate PPG Signals [0.34376560669160394]
Photoplethymography (volution) signals, which measure changes in blood volume in the skin using light, have recently gained attention in biometric authentication.<n>Photoplethymography signal quality is challenged by motion artifacts, illumination changes, and inter-subject physiological variability.<n>This study proposes a lightweight and cost-effective biometric authentication framework based on PPG signals extracted from low-frame-rate fingertip videos.
arXiv Detail & Related papers (2025-11-06T04:16:13Z) - Automated Cervical Os Segmentation for Camera-Guided, Speculum-Free Screening [38.85521544870542]
This study evaluates deep learning methods for real-time segmentation of the cervical os in transvaginal endoscopic images.<n>EndoViT/DPT, a vision transformer pre-trained on surgical video, achieved the highest DICE (0.50 pm 0.31) and detection rate (0.87 pm 0.33)<n>These results establish a foundation for integrating automated os recognition into speculum-free cervical screening devices to support non-expert use.
arXiv Detail & Related papers (2025-09-12T14:19:27Z) - End to End Autoencoder MLP Framework for Sepsis Prediction [10.151360630975482]
Sepsis is a life threatening condition that requires timely detection in intensive care settings.<n>Traditional machine learning approaches, including Naive Bayes, struggle with irregular, incomplete time-series data.<n>We introduce an end-to-end deep learning framework integrating an unsupervised autoencoder for automatic feature extraction.
arXiv Detail & Related papers (2025-08-26T05:22:48Z) - ECG Latent Feature Extraction with Autoencoders for Downstream Prediction Tasks [2.2616169634370076]
The electrocardiogram (ECG) is an inexpensive and widely available tool for cardiac assessment.<n>Despite its standardized format and small file size, the high complexity and inter-individual variability of ECG signals make it challenging to use in deep learning models.<n>This study addresses these challenges by exploring feature generation methods from representative beat ECGs.<n>We introduce three novel Variational Autoencoder (VAE) variants-Stochastic Autoencoder (SAE), Annealed beta-VAE (A beta-VAE), and Cyclical beta VAE (C beta-VAE)-and compare their effectiveness in maintaining
arXiv Detail & Related papers (2025-07-31T19:37:05Z) - UltraAD: Fine-Grained Ultrasound Anomaly Classification via Few-Shot CLIP Adaptation [39.48115172323913]
We propose UltraAD, a vision-language model (VLM)-based approach for anomaly localization and fine-grained classification.<n>UltraAD has been extensively evaluated on three breast US datasets, outperforming state-of-the-art methods in both lesion datasets and fine-grained medical classification.
arXiv Detail & Related papers (2025-06-24T15:00:38Z) - Reliable Multi-View Learning with Conformal Prediction for Aortic Stenosis Classification in Echocardiography [6.540741143328299]
The acquired images are often 2-D cross-sections of a 3-D anatomy, potentially missing important anatomical details.
We propose Re-Training for Uncertainty (RT4U), a data-centric method to introduce uncertainty to weakly informative inputs in the training set.
When combined with conformal prediction techniques, RT4U can yield adaptively sized prediction sets which are guaranteed to contain the ground truth class to a high accuracy.
arXiv Detail & Related papers (2024-09-15T10:06:06Z) - SQUWA: Signal Quality Aware DNN Architecture for Enhanced Accuracy in Atrial Fibrillation Detection from Noisy PPG Signals [37.788535094404644]
Atrial fibrillation (AF) significantly increases the risk of stroke, heart disease, and mortality.
Photoplethysmography ( PPG) signals are susceptible to corruption from motion artifacts and other factors often encountered in ambulatory settings.
We propose a novel deep learning model, designed to learn how to retain accurate predictions from partially corrupted PPG.
arXiv Detail & Related papers (2024-04-15T01:07:08Z) - Interpretable cancer cell detection with phonon microscopy using multi-task conditional neural networks for inter-batch calibration [39.759100498329275]
We present a conditional neural network framework to simultaneously achieve inter-batch calibration.
We validate our approach by training and validating on different experimental batches.
We extend our model to reconstruct denoised signals, enabling physical interpretation of salient features indicating disease state.
arXiv Detail & Related papers (2024-03-26T12:20:10Z) - Automated interpretation of congenital heart disease from multi-view
echocardiograms [10.238433789459624]
Congenital heart disease (CHD) is the most common birth defect and the leading cause of neonate death in China.
This study proposes to automatically analyze the multi-view echocardiograms with a practical end-to-end framework.
arXiv Detail & Related papers (2023-11-30T18:37:21Z) - Improving Diffusion Models for ECG Imputation with an Augmented Template
Prior [43.6099225257178]
noisy and poor-quality recordings are a major issue for signals collected using mobile health systems.
Recent studies have explored the imputation of missing values in ECG with probabilistic time-series models.
We present a template-guided denoising diffusion probabilistic model (DDPM), PulseDiff, which is conditioned on an informative prior for a range of health conditions.
arXiv Detail & Related papers (2023-10-24T11:34:15Z) - Unsupervised sequence-to-sequence learning for automatic signal quality
assessment in multi-channel electrical impedance-based hemodynamic monitoring [0.6875312133832077]
This study proposes an unsupervised sequence-to-sequence learning approach that automatically assesses the motion-induced reliability of the cardiac volume signal (CVS) in hemodynamic monitoring.
An encoder-decoder model is trained not only to self-reproduce an input sequence of the CVS but also to extrapolate the future in a parallel fashion.
A motion-influenced CVS of low-quality is detected, based on the residual between the input sequence and its neural representation with a cut-off value determined from the two-sigma rule of thumb over the training set.
arXiv Detail & Related papers (2023-05-16T11:52:06Z) - DopUS-Net: Quality-Aware Robotic Ultrasound Imaging based on Doppler
Signal [48.97719097435527]
DopUS-Net combines the Doppler images with B-mode images to increase the segmentation accuracy and robustness of small blood vessels.
An artery re-identification module qualitatively evaluate the real-time segmentation results and automatically optimize the probe pose for enhanced Doppler images.
arXiv Detail & Related papers (2023-05-15T18:19:29Z) - Preservation of High Frequency Content for Deep Learning-Based Medical
Image Classification [74.84221280249876]
An efficient analysis of large amounts of chest radiographs can aid physicians and radiologists.
We propose a novel Discrete Wavelet Transform (DWT)-based method for the efficient identification and encoding of visual information.
arXiv Detail & Related papers (2022-05-08T15:29:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.