Axial-Centric Cross-Plane Attention for 3D Medical Image Classification
- URL: http://arxiv.org/abs/2602.21636v1
- Date: Wed, 25 Feb 2026 07:00:24 GMT
- Title: Axial-Centric Cross-Plane Attention for 3D Medical Image Classification
- Authors: Doyoung Park, Jinsoo Kim, Lohendran Baskaran,
- Abstract summary: Clinicians commonly interpret 3D medical images using multiple anatomical planes rather than as a single representation.<n>We propose an axial-centric cross-plane attention architecture for 3D medical image classification.<n>Our architecture incorporates MedDINOv3, a medical vision foundation model pretrained via self-supervised learning on large-scale axial CT images.
- Score: 0.14868610461685575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clinicians commonly interpret three-dimensional (3D) medical images, such as computed tomography (CT) scans, using multiple anatomical planes rather than as a single volumetric representation. In this multi-planar approach, the axial plane typically serves as the primary acquisition and diagnostic reference, while the coronal and sagittal planes provide complementary spatial information to increase diagnostic confidence. However, many existing 3D deep learning methods either process volumetric data holistically or assign equal importance to all planes, failing to reflect the axial-centric clinical interpretation workflow. To address this gap, we propose an axial-centric cross-plane attention architecture for 3D medical image classification that captures the inherent asymmetric dependencies between different anatomical planes. Our architecture incorporates MedDINOv3, a medical vision foundation model pretrained via self-supervised learning on large-scale axial CT images, as a frozen feature extractor for the axial, coronal, and sagittal planes. RICA blocks and intra-plane transformer encoders capture plane-specific positional and contextual information within each anatomical plane, while axial-centric cross-plane transformer encoders condition axial features on complementary information from auxiliary planes. Experimental results on six datasets from the MedMNIST3D benchmark demonstrate that the proposed architecture consistently outperforms existing 3D and multi-plane models in terms of accuracy and AUC. Ablation studies further confirm the importance of axial-centric query-key-value allocation and directional cross-plane fusion. These results highlight the importance of aligning architectural design with clinical interpretation workflows for robust and data-efficient 3D medical image analysis.
Related papers
- TRELLIS-Enhanced Surface Features for Comprehensive Intracranial Aneurysm Analysis [2.624902795082451]
Intracranial aneurysms pose a significant clinical risk yet are difficult to detect, delineate and model due to limited annotated 3D data.<n>We propose a cross-domain feature-transfer approach that leverages the latent geometric embeddings learned by TRELLIS, a generative model trained on large-scale non-medical 3D datasets.
arXiv Detail & Related papers (2025-09-03T07:51:17Z) - 3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models [51.855377054763345]
This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model for generating radiology reports from 3D CT scans.
Experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality.
arXiv Detail & Related papers (2024-09-28T12:31:07Z) - GASA-UNet: Global Axial Self-Attention U-Net for 3D Medical Image Segmentation [8.939740171704388]
We introduce a refined U-Net-like model featuring a novel Global Axial Self-Attention (GASA) block.
This block processes image data as a 3D entity, with each 2D plane representing a different anatomical cross-section.
Our model has demonstrated promising improvements in segmentation performance, particularly for smaller anatomical structures.
arXiv Detail & Related papers (2024-09-20T01:23:53Z) - Diff3Dformer: Leveraging Slice Sequence Diffusion for Enhanced 3D CT Classification with Transformer Networks [5.806035963947936]
We propose a Diffusion-based 3D Vision Transformer (Diff3Dformer) to aggregate repetitive information within 3D CT scans.
Our method exhibits improved performance on two different scales of small datasets of 3D lung CT scans.
arXiv Detail & Related papers (2024-06-24T23:23:18Z) - Automatic view plane prescription for cardiac magnetic resonance imaging
via supervision by spatial relationship between views [30.59897595586657]
This work presents a clinic-compatible, annotation-free system for automatic cardiac magnetic resonance (CMR) view planning.
The system mines the spatial relationship, more specifically, locates the intersecting lines, between the target planes and source views.
The interplay of multiple target planes predicted in a source view is utilized in a stacked hourglass architecture to gradually improve the regression.
arXiv Detail & Related papers (2023-09-22T11:36:42Z) - On the Localization of Ultrasound Image Slices within Point Distribution
Models [84.27083443424408]
Thyroid disorders are most commonly diagnosed using high-resolution Ultrasound (US)
Longitudinal tracking is a pivotal diagnostic protocol for monitoring changes in pathological thyroid morphology.
We present a framework for automated US image slice localization within a 3D shape representation.
arXiv Detail & Related papers (2023-09-01T10:10:46Z) - Affinity Feature Strengthening for Accurate, Complete and Robust Vessel
Segmentation [48.638327652506284]
Vessel segmentation is crucial in many medical image applications, such as detecting coronary stenoses, retinal vessel diseases and brain aneurysms.
We present a novel approach, the affinity feature strengthening network (AFN), which jointly models geometry and refines pixel-wise segmentation features using a contrast-insensitive, multiscale affinity approach.
arXiv Detail & Related papers (2022-11-12T05:39:17Z) - Agent with Tangent-based Formulation and Anatomical Perception for
Standard Plane Localization in 3D Ultrasound [56.7645826576439]
We introduce a novel reinforcement learning framework for automatic SP localization in 3D US.
First, we formulate SP localization in 3D US as a tangent-point-based problem in RL to restructure the action space.
Second, we design an auxiliary task learning strategy to enhance the model's ability to recognize subtle differences crossing Non-SPs and SPs in plane search.
arXiv Detail & Related papers (2022-07-01T14:53:27Z) - A unified 3D framework for Organs at Risk Localization and Segmentation
for Radiation Therapy Planning [56.52933974838905]
Current medical workflow requires manual delineation of organs-at-risk (OAR)
In this work, we aim to introduce a unified 3D pipeline for OAR localization-segmentation.
Our proposed framework fully enables the exploitation of 3D context information inherent in medical imaging.
arXiv Detail & Related papers (2022-03-01T17:08:41Z) - Optimal Transport-based Graph Matching for 3D retinal OCT image
registration [13.93497551507791]
This paper presents a novel but efficient framework involving an optimal transport based graph matching (OT-GM) method for 3D mouse OCT image registration.
Both subjective and objective evaluation results demonstrate that our framework outperforms other well-established methods on mouse OCT images within a reasonable execution time.
arXiv Detail & Related papers (2022-02-28T20:15:12Z) - 3D Reconstruction of Curvilinear Structures with Stereo Matching
DeepConvolutional Neural Networks [52.710012864395246]
We propose a fully automated pipeline for both detection and matching of curvilinear structures in stereo pairs.
We mainly focus on 3D reconstruction of dislocations from stereo pairs of TEM images.
arXiv Detail & Related papers (2021-10-14T23:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.