Motion-enhanced Cardiac Anatomy Segmentation via an Insertable Temporal Attention Module
- URL: http://arxiv.org/abs/2501.14929v2
- Date: Sat, 06 Sep 2025 19:37:24 GMT
- Title: Motion-enhanced Cardiac Anatomy Segmentation via an Insertable Temporal Attention Module
- Authors: Md. Kamrul Hasan, Guang Yang, Choon Hwai Yap,
- Abstract summary: We propose a novel, computationally efficient Temporal Attention Module (TAM) that offers robust motion enhancement.<n>TAM's uniqueness is that it is a lightweight, plug-and-play module that can be inserted into a broad range of segmentation networks.<n>Experiments on multiple 2D and 3D cardiac ultrasound and MRI datasets confirm that TAM consistently improves segmentation across a range of networks.
- Score: 5.796175310950299
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cardiac anatomy segmentation is useful for clinical assessment of cardiac morphology to inform diagnosis and intervention. Deep learning (DL), especially with motion information, has improved segmentation accuracy. However, existing techniques for motion enhancement are not yet optimal, and they have high computational costs due to increased dimensionality or reduced robustness due to suboptimal approaches that use non-DL motion registration, non-attention models, or single-headed attention. They further have limited adaptability and are inconvenient for incorporation into existing networks where motion awareness is desired. Here, we propose a novel, computationally efficient Temporal Attention Module (TAM) that offers robust motion enhancement, modeled as a small, multi-headed, cross-temporal attention module. TAM's uniqueness is that it is a lightweight, plug-and-play module that can be inserted into a broad range of segmentation networks (CNN-based, Transformer-based, or hybrid) for motion enhancement without requiring substantial changes in the network's backbone. This feature enables high adaptability and ease of integration for enhancing both existing and future networks. Extensive experiments on multiple 2D and 3D cardiac ultrasound and MRI datasets confirm that TAM consistently improves segmentation across a range of networks while maintaining computational efficiency and improving on currently reported performance. The evidence demonstrates that it is a robust, generalizable solution for motion-awareness enhancement that is scalable (such as from 2D to 3D).
Related papers
- Enhancing Fitness Movement Recognition with Attention Mechanism and Pre-Trained Feature Extractors [1.7619303397097408]
Fitness movement recognition plays a vital role in health monitoring, rehabilitation, and personalized fitness training.<n>We present a framework that integrates pre-trained 2D Convolutional Neural Networks (CNNs) with a Long Short-Term Memory (LSTM) network enhanced by spatial attention.<n>We evaluate the framework on a curated subset of the UCF101 dataset, achieving a peak accuracy of 93.34% with the ResNet50-based configuration.
arXiv Detail & Related papers (2025-09-02T17:04:42Z) - Latent Interpolation Learning Using Diffusion Models for Cardiac Volume Reconstruction [26.7771170972558]
existing methods face challenges, including reliance on predefined schemes, computational inefficiency, and dependence on additional semantic inputs.<n>We present a data-driven cardiac Latent Interpoltent Diffusion (CaLID) framework that can capture complex, non-temporal relationships between sparse slices.<n>Second, we design a computationally efficient method that operates in the latent space and speeds up 3D-heart upsampling by a factor of 24, reducing computational time.<n>Third, we extend our method to 2D+T data, enabling the effective modeling of temporal coherence.
arXiv Detail & Related papers (2025-08-19T13:36:16Z) - U-R-VEDA: Integrating UNET, Residual Links, Edge and Dual Attention, and Vision Transformer for Accurate Semantic Segmentation of CMRs [0.0]
We propose a deep learning based enhanced UNet model, U-R-Veda, which integrates convolution transformations, vision transformer, residual links, channelattention, and spatial attention.<n>The model significantly improves the semantic segmentation of cardiac magnetic resonance (CMR) images.<n>Performance results show that U-R-Veda achieves an average accuracy of 95.2%, based on DSC.
arXiv Detail & Related papers (2025-06-25T04:10:09Z) - Beyond the Eye: A Relational Model for Early Dementia Detection Using Retinal OCTA Images [43.73298205923969]
We present a novel PolarNet+ that uses retinal optical coherence tomography angiography ( OCTA) to discriminate early-onset Alzheimer's disease (AD) and mild cognitive impairment (MCI) subjects from controls.<n>Our method first maps OCTA images from Cartesian coordinates to polar coordinates, allowing approximate sub-region calculation.<n>We then introduce a multi-view module to serialize and analyze the images along three dimensions for comprehensive, clinically useful information extraction.
arXiv Detail & Related papers (2024-08-09T15:10:34Z) - Motion-aware 3D Gaussian Splatting for Efficient Dynamic Scene Reconstruction [89.53963284958037]
We propose a novel motion-aware enhancement framework for dynamic scene reconstruction.
Specifically, we first establish a correspondence between 3D Gaussian movements and pixel-level flow.
For the prevalent deformation-based paradigm that presents a harder optimization problem, a transient-aware deformation auxiliary module is proposed.
arXiv Detail & Related papers (2024-03-18T03:46:26Z) - Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - CV-Attention UNet: Attention-based UNet for 3D Cerebrovascular Segmentation of Enhanced TOF-MRA Images [2.2265536092123006]
We propose the 3D cerebrovascular attention UNet method, named CV-AttentionUNet, for precise extraction of brain vessel images.
To combine the low and high semantics, we applied the attention mechanism.
We believe that the novelty of this algorithm lies in its ability to perform well on both labeled and unlabeled data.
arXiv Detail & Related papers (2023-11-16T22:31:05Z) - Inflated 3D Convolution-Transformer for Weakly-supervised Carotid
Stenosis Grading with Ultrasound Videos [12.780908780402516]
We present the first video classification framework for automatic carotid stenosis grading (CSG)
We propose a novel and effective video classification network for weakly-supervised CSG.
Our approach is extensively validated on a large clinically collected carotid US video dataset.
arXiv Detail & Related papers (2023-06-05T02:50:06Z) - Can SAM Boost Video Super-Resolution? [78.29033914169025]
We propose a simple yet effective module -- SAM-guidEd refinEment Module (SEEM)
This light-weight plug-in module is specifically designed to leverage the attention mechanism for the generation of semantic-aware feature.
We apply our SEEM to two representative methods, EDVR and BasicVSR, resulting in consistently improved performance with minimal implementation effort.
arXiv Detail & Related papers (2023-05-11T02:02:53Z) - Reliable Joint Segmentation of Retinal Edema Lesions in OCT Images [55.83984261827332]
In this paper, we propose a novel reliable multi-scale wavelet-enhanced transformer network.
We develop a novel segmentation backbone that integrates a wavelet-enhanced feature extractor network and a multi-scale transformer module.
Our proposed method achieves better segmentation accuracy with a high degree of reliability as compared to other state-of-the-art segmentation approaches.
arXiv Detail & Related papers (2022-12-01T07:32:56Z) - RetiFluidNet: A Self-Adaptive and Multi-Attention Deep Convolutional
Network for Retinal OCT Fluid Segmentation [3.57686754209902]
Quantification of retinal fluids is necessary for OCT-guided treatment management.
New convolutional neural architecture named RetiFluidNet is proposed for multi-class retinal fluid segmentation.
Model benefits from hierarchical representation learning of textural, contextual, and edge features.
arXiv Detail & Related papers (2022-09-26T07:18:00Z) - Large-Kernel Attention for 3D Medical Image Segmentation [14.76728117630242]
In this paper, a novel large- kernel (LK) attention module is proposed to achieve accurate multi-organ segmentation and tumor segmentation.
The advantages of convolution and self-attention are combined in the proposed LK attention module, including local contextual information, long-range dependence, and channel adaptation.
The module also decomposes the LK convolution to optimize the computational cost and can be easily incorporated into FCNs such as U-Net.
arXiv Detail & Related papers (2022-07-19T16:32:55Z) - Real-time landmark detection for precise endoscopic submucosal
dissection via shape-aware relation network [51.44506007844284]
We propose a shape-aware relation network for accurate and real-time landmark detection in endoscopic submucosal dissection surgery.
We first devise an algorithm to automatically generate relation keypoint heatmaps, which intuitively represent the prior knowledge of spatial relations among landmarks.
We then develop two complementary regularization schemes to progressively incorporate the prior knowledge into the training process.
arXiv Detail & Related papers (2021-11-08T07:57:30Z) - CT-Net: Channel Tensorization Network for Video Classification [48.4482794950675]
3D convolution is powerful for video classification but often computationally expensive.
Most approaches fail to achieve a preferable balance between convolutional efficiency and feature-interaction sufficiency.
We propose a concise and novel Channelization Network (CT-Net)
Our CT-Net outperforms a number of recent SOTA approaches, in terms of accuracy and/or efficiency.
arXiv Detail & Related papers (2021-06-03T05:35:43Z) - DFENet: A Novel Dimension Fusion Edge Guided Network for Brain MRI
Segmentation [0.0]
We propose a novel Dimension Fusion Edge-guided network (DFENet) that can meet both of these requirements by fusing the features of 2D and 3D CNNs.
The proposed model is robust, accurate, superior to the existing methods, and can be relied upon for biomedical applications.
arXiv Detail & Related papers (2021-05-17T15:43:59Z) - Learning Comprehensive Motion Representation for Action Recognition [124.65403098534266]
2D CNN-based methods are efficient but may yield redundant features due to applying the same 2D convolution kernel to each frame.
Recent efforts attempt to capture motion information by establishing inter-frame connections while still suffering the limited temporal receptive field or high latency.
We propose a Channel-wise Motion Enhancement (CME) module to adaptively emphasize the channels related to dynamic information with a channel-wise gate vector.
We also propose a Spatial-wise Motion Enhancement (SME) module to focus on the regions with the critical target in motion, according to the point-to-point similarity between adjacent feature maps.
arXiv Detail & Related papers (2021-03-23T03:06:26Z) - A Multi-Stage Attentive Transfer Learning Framework for Improving
COVID-19 Diagnosis [49.3704402041314]
We propose a multi-stage attentive transfer learning framework for improving COVID-19 diagnosis.
Our proposed framework consists of three stages to train accurate diagnosis models through learning knowledge from multiple source tasks and data of different domains.
Importantly, we propose a novel self-supervised learning method to learn multi-scale representations for lung CT images.
arXiv Detail & Related papers (2021-01-14T01:39:19Z) - Learning Tubule-Sensitive CNNs for Pulmonary Airway and Artery-Vein
Segmentation in CT [45.93021999366973]
Training convolutional neural networks (CNNs) for segmentation of pulmonary airway, artery, and vein is challenging.
We present a CNNs-based method for accurate airway and artery-vein segmentation in non-contrast computed tomography.
It enjoys superior sensitivity to tenuous peripheral bronchioles, arterioles, and venules.
arXiv Detail & Related papers (2020-12-10T15:56:08Z) - Few-shot Medical Image Segmentation using a Global Correlation Network
with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation.
We construct our few-shot image segmentor using a deep convolutional network trained episodically.
We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z) - Knowing What, Where and When to Look: Efficient Video Action Modeling
with Attention [84.83632045374155]
Attentive video modeling is essential for action recognition in unconstrained videos.
What-Where-When (W3) video attention module models all three facets of video attention jointly.
Experiments show that our attention model brings significant improvements to existing action recognition models.
arXiv Detail & Related papers (2020-04-02T21:48:11Z) - Heart Sound Segmentation using Bidirectional LSTMs with Attention [37.62160903348547]
We propose a novel framework for the segmentation of phonocardiogram (PCG) signals into heart states.
We exploit recent advancements in attention based learning to segment the PCG signal.
The proposed method attains state-of-the-art performance on multiple benchmarks including both human and animal heart recordings.
arXiv Detail & Related papers (2020-04-02T02:09:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.