Related papers: Inflated 3D Convolution-Transformer for Weakly-supervised Carotid Stenosis Grading with Ultrasound Videos

Inflated 3D Convolution-Transformer for Weakly-supervised Carotid Stenosis Grading with Ultrasound Videos

URL: http://arxiv.org/abs/2306.02548v3
Date: Mon, 12 Jun 2023 09:28:07 GMT
Title: Inflated 3D Convolution-Transformer for Weakly-supervised Carotid Stenosis Grading with Ultrasound Videos
Authors: Xinrui Zhou, Yuhao Huang, Wufeng Xue, Xin Yang, Yuxin Zou, Qilong Ying, Yuanji Zhang, Jia Liu, Jie Ren, Dong Ni
Abstract summary: We present the first video classification framework for automatic carotid stenosis grading (CSG) We propose a novel and effective video classification network for weakly-supervised CSG. Our approach is extensively validated on a large clinically collected carotid US video dataset.
Score: 12.780908780402516
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Localization of the narrowest position of the vessel and corresponding vessel and remnant vessel delineation in carotid ultrasound (US) are essential for carotid stenosis grading (CSG) in clinical practice. However, the pipeline is time-consuming and tough due to the ambiguous boundaries of plaque and temporal variation. To automatize this procedure, a large number of manual delineations are usually required, which is not only laborious but also not reliable given the annotation difficulty. In this study, we present the first video classification framework for automatic CSG. Our contribution is three-fold. First, to avoid the requirement of laborious and unreliable annotation, we propose a novel and effective video classification network for weakly-supervised CSG. Second, to ease the model training, we adopt an inflation strategy for the network, where pre-trained 2D convolution weights can be adapted into the 3D counterpart in our network for an effective warm start. Third, to enhance the feature discrimination of the video, we propose a novel attention-guided multi-dimension fusion (AMDF) transformer encoder to model and integrate global dependencies within and across spatial and temporal dimensions, where two lightweight cross-dimensional attention mechanisms are designed. Our approach is extensively validated on a large clinically collected carotid US video dataset, demonstrating state-of-the-art performance compared with strong competitors.

Related papers

Landmark-Free Preoperative-to-Intraoperative Registration in Laparoscopic Liver Resection [50.388465935739376]
Liver registration by overlaying preoperative 3D models onto intraoperative 2D frames can assist surgeons in perceiving the spatial anatomy of the liver clearly for a higher surgical success rate. Existing registration methods rely heavily on anatomical landmark-based, which encounter two major limitations. We propose a landmark-free preoperative-to-intraoperative registration framework utilizing effective self-supervised learning.
arXiv Detail & Related papers (2025-04-21T14:55:57Z)
Anatomically-guided masked autoencoder pre-training for aneurysm detection [6.025753055139489]
Intracranial aneurysms are a major cause of morbidity and mortality worldwide. We propose a novel pre-training strategy using unannotated head CT scan data to pre-train a 3D Vision Transformer model. Compared with SOTA aneurysm detection models, our approach gains +4-8% absolute Sensitivity at a false positive rate of 0.5.
arXiv Detail & Related papers (2025-02-28T17:13:58Z)
ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks. We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation. Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z)
TASL-Net: Tri-Attention Selective Learning Network for Intelligent Diagnosis of Bimodal Ultrasound Video [10.087796410298061]
This paper proposes a novel Tri-Attention Selective Learning Network (TASL-Net) to tackle this challenge. TASL-Net embeds three types of diagnostic attention of sonographers into a mutual transformer framework for intelligent diagnosis of bimodal ultrasound videos. We conduct a detailed experimental validation of TASL-Net's performance on three datasets, including lung, breast, and liver.
arXiv Detail & Related papers (2024-09-03T02:50:37Z)
Epicardium Prompt-guided Real-time Cardiac Ultrasound Frame-to-volume Registration [50.602074919305636]
This paper introduces a lightweight end-to-end Cardiac Ultrasound frame-to-volume Registration network, termed CU-Reg. We use epicardium prompt-guided anatomical clues to reinforce the interaction of 2D sparse and 3D dense features, followed by a voxel-wise local-global aggregation of enhanced features.
arXiv Detail & Related papers (2024-06-20T17:47:30Z)
DD-VNB: A Depth-based Dual-Loop Framework for Real-time Visually Navigated Bronchoscopy [5.8722774441994074]
We propose a Depth-based Dual-Loop framework for real-time Visually Navigated Bronchoscopy (DD-VNB) The DD-VNB framework integrates two key modules: depth estimation and dual-loop localization. Experiments on phantom and in-vivo data from patients demonstrate the effectiveness of our framework.
arXiv Detail & Related papers (2024-03-04T02:29:02Z)
3D Vascular Segmentation Supervised by 2D Annotation of Maximum Intensity Projection [33.34240545722551]
Vascular structure segmentation plays a crucial role in medical analysis and clinical applications. Existing weakly supervised methods have exhibited suboptimal performance when handling sparse vascular structure. Here, we employ maximum intensity projection (MIP) to decrease the dimensionality of 3D volume to 2D image for efficient annotation. We introduce a weakly-supervised network that fuses 2D-3D deep features via MIP to further improve segmentation performance.
arXiv Detail & Related papers (2024-02-19T13:24:46Z)
F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis [94.10861578387443]
We explore the inference process of two mainstream T2V models using transformers and diffusion models. We propose a training-free and generalized pruning strategy called F3-Pruning to prune redundant temporal attention weights. Extensive experiments on three datasets using a classic transformer-based model CogVideo and a typical diffusion-based model Tune-A-Video verify the effectiveness of F3-Pruning.
arXiv Detail & Related papers (2023-12-06T12:34:47Z)
Focused Decoding Enables 3D Anatomical Detection by Transformers [64.36530874341666]
We propose a novel Detection Transformer for 3D anatomical structure detection, dubbed Focused Decoder. Focused Decoder leverages information from an anatomical region atlas to simultaneously deploy query anchors and restrict the cross-attention's field of view. We evaluate our proposed approach on two publicly available CT datasets and demonstrate that Focused Decoder not only provides strong detection results and thus alleviates the need for a vast amount of annotated data but also exhibits exceptional and highly intuitive explainability of results via attention weights.
arXiv Detail & Related papers (2022-07-21T22:17:21Z)
Weakly-supervised Learning For Catheter Segmentation in 3D Frustum Ultrasound [74.22397862400177]
We propose a novel Frustum ultrasound based catheter segmentation method. The proposed method achieved the state-of-the-art performance with an efficiency of 0.25 second per volume.
arXiv Detail & Related papers (2020-10-19T13:56:22Z)
Deep Q-Network-Driven Catheter Segmentation in 3D US by Hybrid Constrained Semi-Supervised Learning and Dual-UNet [74.22397862400177]
We propose a novel catheter segmentation approach, which requests fewer annotations than the supervised learning method. Our scheme considers a deep Q learning as the pre-localization step, which avoids voxel-level annotation. With the detected catheter, patch-based Dual-UNet is applied to segment the catheter in 3D volumetric data.
arXiv Detail & Related papers (2020-06-25T21:10:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.