Spatial-Aware Self-Supervision for Medical 3D Imaging with Multi-Granularity Observable Tasks
- URL: http://arxiv.org/abs/2509.05967v1
- Date: Sun, 07 Sep 2025 08:16:37 GMT
- Title: Spatial-Aware Self-Supervision for Medical 3D Imaging with Multi-Granularity Observable Tasks
- Authors: Yiqin Zhang, Meiling Chen, Zhengjie Zhang,
- Abstract summary: We propose a method consisting of three sub-tasks to capture the spatially relevant semantics in medical 3D imaging.<n>Their design adheres to observable principles to ensure interpretability, and minimize the performance loss caused thereby as much as possible.
- Score: 4.097364225798782
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The application of self-supervised techniques has become increasingly prevalent within medical visualization tasks, primarily due to its capacity to mitigate the data scarcity prevalent in the healthcare sector. The majority of current works are influenced by designs originating in the generic 2D visual domain, which lack the intuitive demonstration of the model's learning process regarding 3D spatial knowledge. Consequently, these methods often fall short in terms of medical interpretability. We propose a method consisting of three sub-tasks to capture the spatially relevant semantics in medical 3D imaging. Their design adheres to observable principles to ensure interpretability, and minimize the performance loss caused thereby as much as possible. By leveraging the enhanced semantic depth offered by the extra dimension in 3D imaging, this approach incorporates multi-granularity spatial relationship modeling to maintain training stability. Experimental findings suggest that our approach is capable of delivering performance that is on par with current methodologies, while facilitating an intuitive understanding of the self-supervised learning process.
Related papers
- Does DINOv3 Set a New Medical Vision Standard? [67.33543059306938]
This report investigates whether DINOv3 can serve as a powerful unified encoder for medical vision tasks without domain-specific pre-training.<n>We benchmark DINOv3 across common medical vision tasks, including 2D/3D classification and segmentation.<n>Remarkably, it can even outperform medical-specific foundation models like BiomedCLIP and CT-Net on several tasks.
arXiv Detail & Related papers (2025-09-08T09:28:57Z) - Medical Semantic Segmentation with Diffusion Pretrain [1.9415817267757087]
Recent advances in deep learning have shown that learning robust feature representations is critical for the success of many computer vision tasks.<n>We propose a novel pretraining strategy using diffusion models with anatomical guidance, tailored to the intricacies of 3D medical image data.<n>We employ an additional model that predicts 3D universal body-part coordinates, providing guidance during the diffusion process.
arXiv Detail & Related papers (2025-01-31T16:25:49Z) - Is Contrastive Distillation Enough for Learning Comprehensive 3D Representations? [55.99654128127689]
Cross-modal contrastive distillation has recently been explored for learning effective 3D representations.<n>Existing methods focus primarily on modality-shared features, neglecting the modality-specific features during the pre-training process.<n>We propose a new framework, namely CMCR, to address these shortcomings.
arXiv Detail & Related papers (2024-12-12T06:09:49Z) - MT3DNet: Multi-Task learning Network for 3D Surgical Scene Reconstruction [0.0]
In image-assisted minimally invasive surgeries (MIS), understanding surgical scenes is vital for real-time feedback to surgeons.<n>The challenge lies in accurately detecting, segmenting, and estimating the depth of surgical scenes depicted in high-resolution images.<n>A novel Multi-Task Learning (MTL) network is proposed for performing these tasks concurrently.
arXiv Detail & Related papers (2024-12-05T07:07:35Z) - Autoregressive Sequence Modeling for 3D Medical Image Representation [48.706230961589924]
We introduce a pioneering method for learning 3D medical image representations through an autoregressive sequence pre-training framework.<n>Our approach various 3D medical images based on spatial, contrast, and semantic correlations, treating them as interconnected visual tokens within a token sequence.
arXiv Detail & Related papers (2024-09-13T10:19:10Z) - Brain3D: Generating 3D Objects from fMRI [76.41771117405973]
We design a novel 3D object representation learning method, Brain3D, that takes as input the fMRI data of a subject.<n>We show that our model captures the distinct functionalities of each region of human vision system.<n>Preliminary evaluations indicate that Brain3D can successfully identify the disordered brain regions in simulated scenarios.
arXiv Detail & Related papers (2024-05-24T06:06:11Z) - Enhancing Weakly Supervised 3D Medical Image Segmentation through Probabilistic-aware Learning [47.700298779672366]
3D medical image segmentation is a challenging task with crucial implications for disease diagnosis and treatment planning.<n>Recent advances in deep learning have significantly enhanced fully supervised medical image segmentation.<n>We propose a novel probabilistic-aware weakly supervised learning pipeline, specifically designed for 3D medical imaging.
arXiv Detail & Related papers (2024-03-05T00:46:53Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - An explainable three dimension framework to uncover learning patterns: A unified look in variable sulci recognition [2.960322639147262]
We develop an explainable artificial intelligence (XAI) 3D-Framework capable of providing accurate, low-complexity global explanations.<n>Our framework integrates statistical features (Shape) and XAI methods (GradCam and SHAP) with dimensionality reduction, ensuring that explanations reflect both model learning and cohort-specific variability.<n>These robust explanations facilitated the identification of critical sub-regions, including the posterior temporal and internal parietal regions, as well as the cingulate region and thalamus.
arXiv Detail & Related papers (2023-09-02T10:46:05Z) - A Point in the Right Direction: Vector Prediction for Spatially-aware
Self-supervised Volumetric Representation Learning [12.369884719068228]
VectorPOSE promotes better spatial understanding with two novel pretext tasks: Vector Prediction and Boundary-Focused Reconstruction.
We evaluate VectorPOSE on three 3D medical image segmentation tasks, showing that it often outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-11-15T22:10:50Z) - 3D endoscopic depth estimation using 3D surface-aware constraints [16.161276518580262]
We show that depth estimation can be reformed from a 3D surface perspective.
We propose a loss function for depth estimation that integrates the surface-aware constraints.
Camera parameters are incorporated into the training pipeline to increase the control and transparency of the depth estimation.
arXiv Detail & Related papers (2022-03-04T04:47:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.