Anatomically Constrained Transformers for Echocardiogram Analysis
- URL: http://arxiv.org/abs/2511.01109v1
- Date: Sun, 02 Nov 2025 22:52:30 GMT
- Title: Anatomically Constrained Transformers for Echocardiogram Analysis
- Authors: Alexander Thorley, Agis Chartsias, Jordan Strom, Jeremy Slivnick, Dipak Kotecha, Alberto Gomez, Jinming Duan,
- Abstract summary: ViACT represents a deforming anatomical structure as a point set and encodes both its spatial geometry and corresponding image patches into transformer tokens.<n>During pre-training, ViACT follows a masked autoencoding strategy that masks and reconstructs only anatomical patches.<n>ViACTs generalize to myocardium point tracking without requiring task-specific components.
- Score: 38.280536446335056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video transformers have recently demonstrated strong potential for echocardiogram (echo) analysis, leveraging self-supervised pre-training and flexible adaptation across diverse tasks. However, like other models operating on videos, they are prone to learning spurious correlations from non-diagnostic regions such as image backgrounds. To overcome this limitation, we propose the Video Anatomically Constrained Transformer (ViACT), a novel framework that integrates anatomical priors directly into the transformer architecture. ViACT represents a deforming anatomical structure as a point set and encodes both its spatial geometry and corresponding image patches into transformer tokens. During pre-training, ViACT follows a masked autoencoding strategy that masks and reconstructs only anatomical patches, enforcing that representation learning is focused on the anatomical region. The pre-trained model can then be fine-tuned for tasks localized to this region. In this work we focus on the myocardium, demonstrating the framework on echo analysis tasks such as left ventricular ejection fraction (EF) regression and cardiac amyloidosis (CA) detection. The anatomical constraint focuses transformer attention within the myocardium, yielding interpretable attention maps aligned with regions of known CA pathology. Moreover, ViACT generalizes to myocardium point tracking without requiring task-specific components such as correlation volumes used in specialized tracking networks.
Related papers
- Prior-AttUNet: Retinal OCT Fluid Segmentation Based on Normal Anatomical Priors and Attention Gating [6.013762133627291]
This study introduces Prior-AttUNet, a segmentation model augmented with generative anatomical priors.<n>The framework adopts a hybrid dual-path architecture that integrates a generative prior pathway with a segmentation network.<n>The model maintains a low computational cost of 0.37 TFLOPs, striking an effective balance between segmentation precision and inference efficiency.
arXiv Detail & Related papers (2025-12-25T14:37:04Z) - PULSE: A Unified Multi-Task Architecture for Cardiac Segmentation, Diagnosis, and Few-Shot Cross-Modality Clinical Adaptation [0.27998963147546135]
We introduce PULSE, a multi-task vision-language framework built on self-supervised representations and optimized through a composite supervision strategy.<n>A multi-scale token reconstruction decoder enables anatomical segmentation, while shared global representations support disease classification and clinically grounded text output.<n>Unlike prior task-specific pipelines, PULSE learns task-invariant cardiac priors, generalizes robustly across datasets, and can be adapted to new imaging modalities with minimal supervision.
arXiv Detail & Related papers (2025-12-03T14:49:01Z) - Anatomically Constrained Transformers for Cardiac Amyloidosis Classification [34.67313275621695]
Cardiac amyloidosis (CA) is a rare cardiomyopathy with typical abnormalities in clinical measurements from echocardiograms.<n>An alternative approach for detecting CA is via neural networks, using video classification models such as convolutional neural networks.<n>We show that our anatomical constraint can also be applied to the popular self-supervised learning masked autoencoder pre-training.
arXiv Detail & Related papers (2025-09-24T02:00:34Z) - GRASPing Anatomy to Improve Pathology Segmentation [67.98147643529309]
We introduce GRASP, a modular plug-and-play framework that enhances pathology segmentation models.<n>We evaluate GRASP on two PET/CT datasets, conduct systematic ablation studies, and investigate the framework's inner workings.
arXiv Detail & Related papers (2025-08-05T12:26:36Z) - Semantic Segmentation for Preoperative Planning in Transcatheter Aortic Valve Replacement [61.573750959726475]
We consider medical guidelines for preoperative planning of the transcatheter aortic valve replacement (TAVR) and identify tasks that may be supported via semantic segmentation models.<n>We first derive fine-grained TAVR-relevant pseudo-labels from coarse-grained anatomical information, in order to train segmentation models and quantify how well they are able to find these structures in the scans.
arXiv Detail & Related papers (2025-07-22T13:24:45Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - Crop and Couple: cardiac image segmentation using interlinked specialist
networks [0.5452923068355806]
We propose a novel strategy that performs segmentation using specialist networks that focus on a single anatomy.
Given an input long-axis cardiac MR image, our method performs a ternary segmentation in the first stage to identify these anatomical regions.
The specialist networks are coupled through an attention mechanism that performs cross-attention to interlink features from different anatomies.
arXiv Detail & Related papers (2024-02-14T13:14:04Z) - A Multi-Scale Spatial Transformer U-Net for Simultaneously Automatic
Reorientation and Segmentation of 3D Nuclear Cardiac Images [6.347837887930855]
Small-scale LV myocardium (LV-MY) region detection and the diverse cardiac structures of individual patients pose challenges to LV segmentation operation.
We propose an end-to-end model, named as multi-scale spatial transformer UNet (MS-ST-UNet), that involves the multi-scale spatial transformer network (MSSTN) and multi-scale UNet (MSUNet) modules.
The proposed method is trained and tested using two different nuclear cardiac image modalities: 13N-ammonia PET and 99mTc-sestamibi SPECT.
arXiv Detail & Related papers (2023-10-16T05:56:53Z) - Shape of my heart: Cardiac models through learned signed distance functions [33.29148402516714]
In this work, the cardiac shape is reconstructed by means of three-dimensional deep signed distance functions with Lipschitz regularity.
For this purpose, the shapes of cardiac MRI reconstructions are learned to model the spatial relation of multiple chambers.
We demonstrate that this approach is also capable of reconstructing anatomical models from partial data, such as point clouds from a single ventricle.
arXiv Detail & Related papers (2023-08-31T09:02:53Z) - Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks.
In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics.
Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z) - SQUID: Deep Feature In-Painting for Unsupervised Anomaly Detection [76.01333073259677]
We propose the use of Space-aware Memory Queues for In-painting and Detecting anomalies from radiography images (abbreviated as SQUID)
We show that SQUID can taxonomize the ingrained anatomical structures into recurrent patterns; and in the inference, it can identify anomalies (unseen/modified patterns) in the image.
arXiv Detail & Related papers (2021-11-26T13:47:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.