Inflated 3D Convolution-Transformer for Weakly-supervised Carotid
Stenosis Grading with Ultrasound Videos
- URL: http://arxiv.org/abs/2306.02548v3
- Date: Mon, 12 Jun 2023 09:28:07 GMT
- Title: Inflated 3D Convolution-Transformer for Weakly-supervised Carotid
Stenosis Grading with Ultrasound Videos
- Authors: Xinrui Zhou, Yuhao Huang, Wufeng Xue, Xin Yang, Yuxin Zou, Qilong
Ying, Yuanji Zhang, Jia Liu, Jie Ren, Dong Ni
- Abstract summary: We present the first video classification framework for automatic carotid stenosis grading (CSG)
We propose a novel and effective video classification network for weakly-supervised CSG.
Our approach is extensively validated on a large clinically collected carotid US video dataset.
- Score: 12.780908780402516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Localization of the narrowest position of the vessel and corresponding vessel
and remnant vessel delineation in carotid ultrasound (US) are essential for
carotid stenosis grading (CSG) in clinical practice. However, the pipeline is
time-consuming and tough due to the ambiguous boundaries of plaque and temporal
variation. To automatize this procedure, a large number of manual delineations
are usually required, which is not only laborious but also not reliable given
the annotation difficulty. In this study, we present the first video
classification framework for automatic CSG. Our contribution is three-fold.
First, to avoid the requirement of laborious and unreliable annotation, we
propose a novel and effective video classification network for
weakly-supervised CSG. Second, to ease the model training, we adopt an
inflation strategy for the network, where pre-trained 2D convolution weights
can be adapted into the 3D counterpart in our network for an effective warm
start. Third, to enhance the feature discrimination of the video, we propose a
novel attention-guided multi-dimension fusion (AMDF) transformer encoder to
model and integrate global dependencies within and across spatial and temporal
dimensions, where two lightweight cross-dimensional attention mechanisms are
designed. Our approach is extensively validated on a large clinically collected
carotid US video dataset, demonstrating state-of-the-art performance compared
with strong competitors.
Related papers
- TASL-Net: Tri-Attention Selective Learning Network for Intelligent Diagnosis of Bimodal Ultrasound Video [10.087796410298061]
This paper proposes a novel Tri-Attention Selective Learning Network (TASL-Net) to tackle this challenge.
TASL-Net embeds three types of diagnostic attention of sonographers into a mutual transformer framework for intelligent diagnosis of bimodal ultrasound videos.
We conduct a detailed experimental validation of TASL-Net's performance on three datasets, including lung, breast, and liver.
arXiv Detail & Related papers (2024-09-03T02:50:37Z) - Epicardium Prompt-guided Real-time Cardiac Ultrasound Frame-to-volume Registration [50.602074919305636]
This paper introduces a lightweight end-to-end Cardiac Ultrasound frame-to-volume Registration network, termed CU-Reg.
We use epicardium prompt-guided anatomical clues to reinforce the interaction of 2D sparse and 3D dense features, followed by a voxel-wise local-global aggregation of enhanced features.
arXiv Detail & Related papers (2024-06-20T17:47:30Z) - DD-VNB: A Depth-based Dual-Loop Framework for Real-time Visually Navigated Bronchoscopy [5.8722774441994074]
We propose a Depth-based Dual-Loop framework for real-time Visually Navigated Bronchoscopy (DD-VNB)
The DD-VNB framework integrates two key modules: depth estimation and dual-loop localization.
Experiments on phantom and in-vivo data from patients demonstrate the effectiveness of our framework.
arXiv Detail & Related papers (2024-03-04T02:29:02Z) - 3D Vascular Segmentation Supervised by 2D Annotation of Maximum
Intensity Projection [33.34240545722551]
Vascular structure segmentation plays a crucial role in medical analysis and clinical applications.
Existing weakly supervised methods have exhibited suboptimal performance when handling sparse vascular structure.
Here, we employ maximum intensity projection (MIP) to decrease the dimensionality of 3D volume to 2D image for efficient annotation.
We introduce a weakly-supervised network that fuses 2D-3D deep features via MIP to further improve segmentation performance.
arXiv Detail & Related papers (2024-02-19T13:24:46Z) - CIS-UNet: Multi-Class Segmentation of the Aorta in Computed Tomography
Angiography via Context-Aware Shifted Window Self-Attention [10.335899694123711]
We introduce Context Infused Swin-UNet (CIS-UNet), a deep learning model for aortic segmentation.
CIS-UNet adopts a hierarchical encoder-decoder structure comprising a CNN encoder, symmetric decoder, skip connections, and a novel Context-aware Shifted Window Self-Attention (CSW-SA) as the bottleneck block.
We trained our model on computed tomography (CT) scans from 44 patients and tested it on 15 patients. CIS-UNet outperformed the state-of-the-art SwinUNetR segmentation model, by achieving a superior mean Dice coefficient of 0.713 compared
arXiv Detail & Related papers (2024-01-23T19:17:20Z) - F3-Pruning: A Training-Free and Generalized Pruning Strategy towards
Faster and Finer Text-to-Video Synthesis [94.10861578387443]
We explore the inference process of two mainstream T2V models using transformers and diffusion models.
We propose a training-free and generalized pruning strategy called F3-Pruning to prune redundant temporal attention weights.
Extensive experiments on three datasets using a classic transformer-based model CogVideo and a typical diffusion-based model Tune-A-Video verify the effectiveness of F3-Pruning.
arXiv Detail & Related papers (2023-12-06T12:34:47Z) - Focused Decoding Enables 3D Anatomical Detection by Transformers [64.36530874341666]
We propose a novel Detection Transformer for 3D anatomical structure detection, dubbed Focused Decoder.
Focused Decoder leverages information from an anatomical region atlas to simultaneously deploy query anchors and restrict the cross-attention's field of view.
We evaluate our proposed approach on two publicly available CT datasets and demonstrate that Focused Decoder not only provides strong detection results and thus alleviates the need for a vast amount of annotated data but also exhibits exceptional and highly intuitive explainability of results via attention weights.
arXiv Detail & Related papers (2022-07-21T22:17:21Z) - Agent with Tangent-based Formulation and Anatomical Perception for
Standard Plane Localization in 3D Ultrasound [56.7645826576439]
We introduce a novel reinforcement learning framework for automatic SP localization in 3D US.
First, we formulate SP localization in 3D US as a tangent-point-based problem in RL to restructure the action space.
Second, we design an auxiliary task learning strategy to enhance the model's ability to recognize subtle differences crossing Non-SPs and SPs in plane search.
arXiv Detail & Related papers (2022-07-01T14:53:27Z) - Weakly-supervised Learning For Catheter Segmentation in 3D Frustum
Ultrasound [74.22397862400177]
We propose a novel Frustum ultrasound based catheter segmentation method.
The proposed method achieved the state-of-the-art performance with an efficiency of 0.25 second per volume.
arXiv Detail & Related papers (2020-10-19T13:56:22Z) - Deep Q-Network-Driven Catheter Segmentation in 3D US by Hybrid
Constrained Semi-Supervised Learning and Dual-UNet [74.22397862400177]
We propose a novel catheter segmentation approach, which requests fewer annotations than the supervised learning method.
Our scheme considers a deep Q learning as the pre-localization step, which avoids voxel-level annotation.
With the detected catheter, patch-based Dual-UNet is applied to segment the catheter in 3D volumetric data.
arXiv Detail & Related papers (2020-06-25T21:10:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.