AVP-AP: Self-supervised Automatic View Positioning in 3D cardiac CT via Atlas Prompting
- URL: http://arxiv.org/abs/2504.05966v1
- Date: Tue, 08 Apr 2025 12:24:37 GMT
- Title: AVP-AP: Self-supervised Automatic View Positioning in 3D cardiac CT via Atlas Prompting
- Authors: Xiaolin Fan, Yan Wang, Yingying Zhang, Mingkun Bao, Bosen Jia, Dong Lu, Yifan Gu, Jian Cheng, Haogang Zhu,
- Abstract summary: AVP-AP is the first framework to use Atlas Prompting for self-supervised Automatic View Positioning in the 3D CT volume.<n>We identify the coarse positions of slices in the target CT volume using rigid transformation between the 3D atlas and target CT volume.<n>Our framework is flexible and efficient compared to other methods, outperforming other methods by 19.8% average structural similarity (SSIM) in arbitrary view positioning.
- Score: 17.710578002931545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic view positioning is crucial for cardiac computed tomography (CT) examinations, including disease diagnosis and surgical planning. However, it is highly challenging due to individual variability and large 3D search space. Existing work needs labor-intensive and time-consuming manual annotations to train view-specific models, which are limited to predicting only a fixed set of planes. However, in real clinical scenarios, the challenge of positioning semantic 2D slices with any orientation into varying coordinate space in arbitrary 3D volume remains unsolved. We thus introduce a novel framework, AVP-AP, the first to use Atlas Prompting for self-supervised Automatic View Positioning in the 3D CT volume. Specifically, this paper first proposes an atlas prompting method, which generates a 3D canonical atlas and trains a network to map slices into their corresponding positions in the atlas space via a self-supervised manner. Then, guided by atlas prompts corresponding to the given query images in a reference CT, we identify the coarse positions of slices in the target CT volume using rigid transformation between the 3D atlas and target CT volume, effectively reducing the search space. Finally, we refine the coarse positions by maximizing the similarity between the predicted slices and the query images in the feature space of a given foundation model. Our framework is flexible and efficient compared to other methods, outperforming other methods by 19.8% average structural similarity (SSIM) in arbitrary view positioning and achieving 9% SSIM in two-chamber view compared to four radiologists. Meanwhile, experiments on a public dataset validate our framework's generalizability.
Related papers
- SliceOcc: Indoor 3D Semantic Occupancy Prediction with Vertical Slice Representation [50.420711084672966]
We present SliceOcc, an RGB camera-based model specifically tailored for indoor 3D semantic occupancy prediction.<n> Experimental results on the EmbodiedScan dataset demonstrate that SliceOcc achieves a mIoU of 15.45% across 81 indoor categories.
arXiv Detail & Related papers (2025-01-28T03:41:24Z) - Automatic view plane prescription for cardiac magnetic resonance imaging
via supervision by spatial relationship between views [30.59897595586657]
This work presents a clinic-compatible, annotation-free system for automatic cardiac magnetic resonance (CMR) view planning.
The system mines the spatial relationship, more specifically, locates the intersecting lines, between the target planes and source views.
The interplay of multiple target planes predicted in a source view is utilized in a stacked hourglass architecture to gradually improve the regression.
arXiv Detail & Related papers (2023-09-22T11:36:42Z) - PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic
Occupancy Prediction [72.75478398447396]
We propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively.
Considering the distance distribution of LiDAR point clouds, we construct the tri-perspective view in the cylindrical coordinate system.
We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane.
arXiv Detail & Related papers (2023-08-31T17:57:17Z) - Interpretable 2D Vision Models for 3D Medical Images [47.75089895500738]
This study proposes a simple approach of adapting 2D networks with an intermediate feature representation for processing 3D images.
We show on all 3D MedMNIST datasets as benchmark and two real-world datasets consisting of several hundred high-resolution CT or MRI scans that our approach performs on par with existing methods.
arXiv Detail & Related papers (2023-07-13T08:27:09Z) - Occupancy Planes for Single-view RGB-D Human Reconstruction [120.5818162569105]
Single-view RGB-D human reconstruction with implicit functions is often formulated as per-point classification.
We propose the occupancy planes (OPlanes) representation, which enables to formulate single-view RGB-D human reconstruction as occupancy prediction on planes which slice through the camera's view frustum.
arXiv Detail & Related papers (2022-08-04T17:59:56Z) - Agent with Tangent-based Formulation and Anatomical Perception for
Standard Plane Localization in 3D Ultrasound [56.7645826576439]
We introduce a novel reinforcement learning framework for automatic SP localization in 3D US.
First, we formulate SP localization in 3D US as a tangent-point-based problem in RL to restructure the action space.
Second, we design an auxiliary task learning strategy to enhance the model's ability to recognize subtle differences crossing Non-SPs and SPs in plane search.
arXiv Detail & Related papers (2022-07-01T14:53:27Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - Using the Order of Tomographic Slices as a Prior for Neural Networks
Pre-Training [1.1470070927586016]
We propose a pre-training method SortingLoss on slices instead of volumes.
It performs pre-training on slices instead of volumes, so that a model could be fine-tuned on a sparse set of slices.
We show that the proposed method performs on par with SimCLR, while working 2x faster and requiring 1.5x less memory.
arXiv Detail & Related papers (2022-03-17T14:58:15Z) - Automated Atlas-based Segmentation of Single Coronal Mouse Brain Slices
using Linear 2D-2D Registration [0.0]
This paper proposes a strategy to automatically segment single 2D coronal slices within a 3D volume of atlas, using linear registration.
We validated its robustness and performance using an exploratory approach at whole-brain scale.
arXiv Detail & Related papers (2021-11-16T12:33:09Z) - Training Automatic View Planner for Cardiac MR Imaging via
Self-Supervision by Spatial Relationship between Views [28.27778627797572]
This work presents a clinic-compatible and annotation-free system for automatic cardiac magnetic resonance imaging view planning.
The system mines the spatial relationship -- more specifically, locates and exploits the intersecting lines -- between the source and target views, and trains deep networks to regress heatmaps defined by these intersecting lines.
A multi-view planning strategy is proposed to aggregate information from the predicted heatmaps for all the source views of a target view, for a globally optimal prescription.
arXiv Detail & Related papers (2021-09-24T02:25:22Z) - 3D Axial-Attention for Lung Nodule Classification [0.11458853556386794]
We propose to use 3D Axial-Attention, which requires a fraction of the computing power of a regular Non-Local network.
We solve the position invariant problem of the Non-Local network by proposing adding 3D positional encoding to shared embeddings.
Our results show that the 3D Axial-Attention model achieves state-of-the-art performance on all evaluation metrics including AUC and Accuracy.
arXiv Detail & Related papers (2020-12-28T06:49:09Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.