Tracking spatial temporal details in ultrasound long video via wavelet analysis and memory bank
- URL: http://arxiv.org/abs/2512.15066v1
- Date: Wed, 17 Dec 2025 04:11:05 GMT
- Title: Tracking spatial temporal details in ultrasound long video via wavelet analysis and memory bank
- Authors: Chenxiao Zhang, Runshi Zhang, Junchen Wang,
- Abstract summary: Low contrast levels and noisy backgrounds of ultrasound videos cause missegmentation of organ boundary.<n>We propose a memory bank-based wavelet filtering and fusion network to extract detailed spatial features.<n>Our method can more accurately segment small thyroid nodules, demonstrating its effectiveness for cases involving small ultrasound objects in long video.
- Score: 5.015880223095155
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Medical ultrasound videos are widely used for medical inspections, disease diagnosis and surgical planning. High-fidelity lesion area and target organ segmentation constitutes a key component of the computer-assisted surgery workflow. The low contrast levels and noisy backgrounds of ultrasound videos cause missegmentation of organ boundary, which may lead to small object losses and increase boundary segmentation errors. Object tracking in long videos also remains a significant research challenge. To overcome these challenges, we propose a memory bank-based wavelet filtering and fusion network, which adopts an encoder-decoder structure to effectively extract fine-grained detailed spatial features and integrate high-frequency (HF) information. Specifically, memory-based wavelet convolution is presented to simultaneously capture category, detailed information and utilize adjacent information in the encoder. Cascaded wavelet compression is used to fuse multiscale frequency-domain features and expand the receptive field within each convolutional layer. A long short-term memory bank using cross-attention and memory compression mechanisms is designed to track objects in long video. To fully utilize the boundary-sensitive HF details of feature maps, an HF-aware feature fusion module is designed via adaptive wavelet filters in the decoder. In extensive benchmark tests conducted on four ultrasound video datasets (two thyroid nodule, the thyroid gland, the heart datasets) compared with the state-of-the-art methods, our method demonstrates marked improvements in segmentation metrics. In particular, our method can more accurately segment small thyroid nodules, demonstrating its effectiveness for cases involving small ultrasound objects in long video. The code is available at https://github.com/XiAooZ/MWNet.
Related papers
- FreqDINO: Frequency-Guided Adaptation for Generalized Boundary-Aware Ultrasound Image Segmentation [18.491659279102993]
We propose FreqDINO, a frequency-guided segmentation framework that enhances boundary perception and structural consistency.<n>Specifically, we devise a Multi-scale Frequency Extraction and Alignment strategy to separate low-frequency structures and multi-scale high-frequency boundary details.<n>We also introduce a Frequency-Guided Boundary Refinement (FGBR) module that extracts boundary prototypes from high-frequency components and refines spatial features.
arXiv Detail & Related papers (2025-12-12T07:15:38Z) - UESA-Net: U-Shaped Embedded Multidirectional Shrinkage Attention Network for Ultrasound Nodule Segmentation [12.967178888045728]
Existing networks struggle to reconcile high-level semantics with low-level spatial details.<n>We propose UESA-Net, a U-shaped network with multidirectional shrinkage attention.<n>On two public datasets, UESA-Net achieved state-of-the-art performance with intersection-over-union (IoU) scores of 0.8487 and 0.6495, respectively.
arXiv Detail & Related papers (2025-09-26T14:54:38Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - Frequency-Aware Deepfake Detection: Improving Generalizability through
Frequency Space Learning [81.98675881423131]
This research addresses the challenge of developing a universal deepfake detector that can effectively identify unseen deepfake images.
Existing frequency-based paradigms have relied on frequency-level artifacts introduced during the up-sampling in GAN pipelines to detect forgeries.
We introduce a novel frequency-aware approach called FreqNet, centered around frequency domain learning, specifically designed to enhance the generalizability of deepfake detectors.
arXiv Detail & Related papers (2024-03-12T01:28:00Z) - AiAReSeg: Catheter Detection and Segmentation in Interventional
Ultrasound using Transformers [75.20925220246689]
endovascular surgeries are performed using the golden standard of Fluoroscopy, which uses ionising radiation to visualise catheters and vasculature.
This work proposes a solution using an adaptation of a state-of-the-art machine learning transformer architecture to detect and segment catheters in axial interventional Ultrasound image sequences.
arXiv Detail & Related papers (2023-09-25T19:34:12Z) - A Spatial-Temporal Deformable Attention based Framework for Breast
Lesion Detection in Videos [107.96514633713034]
We propose a spatial-temporal deformable attention based framework, named STNet.
Our STNet introduces a spatial-temporal deformable attention module to perform local spatial-temporal feature fusion.
Experiments on the public breast lesion ultrasound video dataset show that our STNet obtains a state-of-the-art detection performance.
arXiv Detail & Related papers (2023-09-09T07:00:10Z) - Unlocking Fine-Grained Details with Wavelet-based High-Frequency
Enhancement in Transformers [4.208461204572879]
Medical image segmentation is a critical task that plays a vital role in diagnosis, treatment planning, and disease monitoring.
We address the local feature deficiency of the Transformer model by carefully re-designing the self-attention map.
We propose a multi-scale context enhancement block within skip connections to adaptively model inter-scale dependencies.
arXiv Detail & Related papers (2023-08-25T15:42:19Z) - Preservation of High Frequency Content for Deep Learning-Based Medical
Image Classification [74.84221280249876]
An efficient analysis of large amounts of chest radiographs can aid physicians and radiologists.
We propose a novel Discrete Wavelet Transform (DWT)-based method for the efficient identification and encoding of visual information.
arXiv Detail & Related papers (2022-05-08T15:29:54Z) - Unsupervised multi-latent space reinforcement learning framework for
video summarization in ultrasound imaging [0.0]
The COVID-19 pandemic has highlighted the need for a tool to speed up triage in ultrasound scans.
The proposed video-summarization technique is a step in this direction.
We propose a new unsupervised reinforcement learning framework with novel rewards.
arXiv Detail & Related papers (2021-09-03T04:50:35Z) - Weakly-supervised Learning For Catheter Segmentation in 3D Frustum
Ultrasound [74.22397862400177]
We propose a novel Frustum ultrasound based catheter segmentation method.
The proposed method achieved the state-of-the-art performance with an efficiency of 0.25 second per volume.
arXiv Detail & Related papers (2020-10-19T13:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.