Extending 2D foundational DINOv3 representations to 3D segmentation of neonatal brain MR images
- URL: http://arxiv.org/abs/2602.23962v1
- Date: Fri, 27 Feb 2026 12:16:21 GMT
- Title: Extending 2D foundational DINOv3 representations to 3D segmentation of neonatal brain MR images
- Authors: Annayah Usman, Behraj Khan, Tahir Qasim Syed,
- Abstract summary: The global MRI volume is decomposed into non-overlapping 3D windows or sub-cubes, each processed via a separate decoding arm built upon frozen high-fidelity features.<n>The proposed approach achieves a Dice score of 0.65 for a single 3D window.
- Score: 3.186130813218338
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Precise volumetric delineation of hippocampal structures is essential for quantifying neurodevelopmental trajectories in pre-term and term infants, where subtle morphological variations may carry prognostic significance. While foundation encoders trained on large-scale visual data offer discriminative representations, their 2D formulation is a limitation with respect to the $3$D organization of brain anatomy. We propose a volumetric segmentation strategy that reconciles this tension through a structured window-based disassembly-reassembly mechanism: the global MRI volume is decomposed into non-overlapping 3D windows or sub-cubes, each processed via a separate decoding arm built upon frozen high-fidelity features, and subsequently reassembled prior to a ground-truth correspendence using a dense-prediction head. This architecture preserves constant a decoder memory footprint while forcing predictions to lie within an anatomically consistent geometry. Evaluated on the ALBERT dataset for hippocampal segmentation, the proposed approach achieves a Dice score of 0.65 for a single 3D window. The method demonstrates that volumetric anatomical structure could be recovered from frozen 2D foundation representations through structured compositional decoding, and offers a principled and generalizable extension for foundation models for 3D medical applications.
Related papers
- Preoperative-to-intraoperative Liver Registration for Laparoscopic Surgery via Latent-Grounded Correspondence Constraints [51.7011449975586]
Land-Reg is a deformable registration framework that learns latent-grounded 2D-3D landmark correspondences.<n>For rigid registration, Land-Reg embraces a Cross-modal Latent Alignment module.<n>An Uncertainty-enhanced Overlap Landmark Detector with similarity matching is proposed to robustly estimate explicit 2D-3D landmark correspondences.
arXiv Detail & Related papers (2026-03-02T10:44:03Z) - Advanced Geometric Correction Algorithms for 3D Medical Reconstruction: Comparison of Computed Tomography and Macroscopic Imaging [0.9395222766576343]
This paper introduces a hybrid two-stage registration framework for reconstructing 3D kidney anatomy from macroscopic slices.<n>It addresses the data-scarcity and high-distortion challenges typical of macroscopic imaging.<n>The proposed framework generalizes to other soft-tissue organs reconstructed from optical or photographic cross-sections.
arXiv Detail & Related papers (2026-01-30T17:16:17Z) - Multimodal Visual Surrogate Compression for Alzheimer's Disease Classification [69.87877580725768]
Multimodal Visual Surrogate Compression (MVSC) learns to compress and adapt large 3D sMRI volumes into compact 2D features.<n>MVSC has two key components: a Volume Context that captures global cross-slice context under textual guidance, and an Adaptive Slice Fusion module that aggregates slice-level information in a text-enhanced, patch-wise manner.
arXiv Detail & Related papers (2026-01-29T13:05:46Z) - Towards Generalisable Foundation Models for 3D Brain MRI [5.527537739064968]
We introduce BrainFound, a self-supervised foundation model for brain MRI built by extending DINO-v2.<n>BrainFound adapts DINO-v2 to model full 3D brain anatomy by incorporating information from sequential MRI slices.<n>It supports both single- and multimodal inputs, enabling a broad range of downstream tasks, including disease detection and image segmentation.
arXiv Detail & Related papers (2025-10-27T15:19:46Z) - Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction [65.67001243986981]
We propose MindHier, a coarse-to-fine fMRI-to-image reconstruction framework built on scale-wise autoregressive modeling.<n>MindHier achieves superior semantic fidelity, 4.67x faster inference, and more deterministic results than the diffusion-based baselines.
arXiv Detail & Related papers (2025-10-25T15:40:07Z) - Bidirectional Mammogram View Translation with Column-Aware and Implicit 3D Conditional Diffusion [17.309030641962]
View-to-view translation can help recover missing views and improve lesion alignment.<n>Unlike natural images, this task in mammography is highly challenging due to large non-rigid deformations and severe tissue overlap in X-ray projections.<n>We propose Column-Aware and Implicit 3D Diffusion (CA3D-Diff), a novel bidirectional mammogram view translation framework.
arXiv Detail & Related papers (2025-10-06T15:48:27Z) - Ov3R: Open-Vocabulary Semantic 3D Reconstruction from RGB Videos [69.21508595833623]
Ov3R is a framework for semantic 3D reconstruction from RGB video streams.<n> CLIP3R predicts dense point maps from overlapping clips while embedding object-level semantics.<n>2D-3D OVS lifts 2D features into 3D by learning fused descriptors integrating spatial, geometric, and semantic cues.
arXiv Detail & Related papers (2025-07-29T17:55:58Z) - Vector Representations of Vessel Trees [12.391128284848135]
We introduce a novel framework for learning vector representations of tree-structured geometric data focusing on 3D vascular networks.<n>Our framework, named VeTTA, offers precise, flexible, and topologically consistent modeling of anatomical tree structures in medical imaging.
arXiv Detail & Related papers (2025-06-11T20:34:08Z) - Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding [58.38294408121273]
We propose Cross-modal and Uncertainty-aware Agglomeration for Open-vocabulary 3D Scene Understanding dubbed CUA-O3D.<n>Our method addresses two key challenges: (1) incorporating semantic priors from VLMs alongside the geometric knowledge of spatially-aware vision foundation models, and (2) using a novel deterministic uncertainty estimation to capture model-specific uncertainties.
arXiv Detail & Related papers (2025-03-20T20:58:48Z) - MedTet: An Online Motion Model for 4D Heart Reconstruction [59.74234226055964]
We present a novel approach to reconstruction of 3D cardiac motion from sparse intraoperative data.<n>Existing methods can accurately reconstruct 3D organ geometries from full 3D volumetric imaging.<n>We propose a versatile framework for reconstructing 3D motion from such partial data.
arXiv Detail & Related papers (2024-12-03T17:18:33Z) - Interpretable 2D Vision Models for 3D Medical Images [47.75089895500738]
This study proposes a simple approach of adapting 2D networks with an intermediate feature representation for processing 3D images.
We show on all 3D MedMNIST datasets as benchmark and two real-world datasets consisting of several hundred high-resolution CT or MRI scans that our approach performs on par with existing methods.
arXiv Detail & Related papers (2023-07-13T08:27:09Z) - CORPS: Cost-free Rigorous Pseudo-labeling based on Similarity-ranking
for Brain MRI Segmentation [3.1657395760137406]
We propose a semi-supervised segmentation framework built upon a novel atlas-based pseudo-labeling method and a 3D deep convolutional neural network (DCNN) for 3D brain MRI segmentation.
The experimental results demonstrate the superiority of the proposed framework over the baseline method both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-05-19T14:42:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.