Related papers: Extending 2D foundational DINOv3 representations to 3D segmentation of neonatal brain MR images

Extending 2D foundational DINOv3 representations to 3D segmentation of neonatal brain MR images

URL: http://arxiv.org/abs/2602.23962v1
Date: Fri, 27 Feb 2026 12:16:21 GMT
Title: Extending 2D foundational DINOv3 representations to 3D segmentation of neonatal brain MR images
Authors: Annayah Usman, Behraj Khan, Tahir Qasim Syed,
Abstract summary: The global MRI volume is decomposed into non-overlapping 3D windows or sub-cubes, each processed via a separate decoding arm built upon frozen high-fidelity features.<n>The proposed approach achieves a Dice score of 0.65 for a single 3D window.
Score: 3.186130813218338
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Precise volumetric delineation of hippocampal structures is essential for quantifying neurodevelopmental trajectories in pre-term and term infants, where subtle morphological variations may carry prognostic significance. While foundation encoders trained on large-scale visual data offer discriminative representations, their 2D formulation is a limitation with respect to the $3$D organization of brain anatomy. We propose a volumetric segmentation strategy that reconciles this tension through a structured window-based disassembly-reassembly mechanism: the global MRI volume is decomposed into non-overlapping 3D windows or sub-cubes, each processed via a separate decoding arm built upon frozen high-fidelity features, and subsequently reassembled prior to a ground-truth correspendence using a dense-prediction head. This architecture preserves constant a decoder memory footprint while forcing predictions to lie within an anatomically consistent geometry. Evaluated on the ALBERT dataset for hippocampal segmentation, the proposed approach achieves a Dice score of 0.65 for a single 3D window. The method demonstrates that volumetric anatomical structure could be recovered from frozen 2D foundation representations through structured compositional decoding, and offers a principled and generalizable extension for foundation models for 3D medical applications.

Related papers

Preoperative-to-intraoperative Liver Registration for Laparoscopic Surgery via Latent-Grounded Correspondence Constraints [51.7011449975586]
Land-Reg is a deformable registration framework that learns latent-grounded 2D-3D landmark correspondences.<n>For rigid registration, Land-Reg embraces a Cross-modal Latent Alignment module.<n>An Uncertainty-enhanced Overlap Landmark Detector with similarity matching is proposed to robustly estimate explicit 2D-3D landmark correspondences.
arXiv Detail & Related papers (2026-03-02T10:44:03Z)
Advanced Geometric Correction Algorithms for 3D Medical Reconstruction: Comparison of Computed Tomography and Macroscopic Imaging [0.9395222766576343]
This paper introduces a hybrid two-stage registration framework for reconstructing 3D kidney anatomy from macroscopic slices.<n>It addresses the data-scarcity and high-distortion challenges typical of macroscopic imaging.<n>The proposed framework generalizes to other soft-tissue organs reconstructed from optical or photographic cross-sections.
arXiv Detail & Related papers (2026-01-30T17:16:17Z)
Multimodal Visual Surrogate Compression for Alzheimer's Disease Classification [69.87877580725768]
Multimodal Visual Surrogate Compression (MVSC) learns to compress and adapt large 3D sMRI volumes into compact 2D features.<n>MVSC has two key components: a Volume Context that captures global cross-slice context under textual guidance, and an Adaptive Slice Fusion module that aggregates slice-level information in a text-enhanced, patch-wise manner.
arXiv Detail & Related papers (2026-01-29T13:05:46Z)
Towards Generalisable Foundation Models for 3D Brain MRI [5.527537739064968]
We introduce BrainFound, a self-supervised foundation model for brain MRI built by extending DINO-v2.<n>BrainFound adapts DINO-v2 to model full 3D brain anatomy by incorporating information from sequential MRI slices.<n>It supports both single- and multimodal inputs, enabling a broad range of downstream tasks, including disease detection and image segmentation.
arXiv Detail & Related papers (2025-10-27T15:19:46Z)
Moving Beyond Diffusion: Hierarchy-to-Hierarchy Autoregression for fMRI-to-Image Reconstruction [65.67001243986981]
We propose MindHier, a coarse-to-fine fMRI-to-image reconstruction framework built on scale-wise autoregressive modeling.<n>MindHier achieves superior semantic fidelity, 4.67x faster inference, and more deterministic results than the diffusion-based baselines.
arXiv Detail & Related papers (2025-10-25T15:40:07Z)
Bidirectional Mammogram View Translation with Column-Aware and Implicit 3D Conditional Diffusion [17.309030641962]
View-to-view translation can help recover missing views and improve lesion alignment.<n>Unlike natural images, this task in mammography is highly challenging due to large non-rigid deformations and severe tissue overlap in X-ray projections.<n>We propose Column-Aware and Implicit 3D Diffusion (CA3D-Diff), a novel bidirectional mammogram view translation framework.
arXiv Detail & Related papers (2025-10-06T15:48:27Z)
Ov3R: Open-Vocabulary Semantic 3D Reconstruction from RGB Videos [69.21508595833623]
Ov3R is a framework for semantic 3D reconstruction from RGB video streams.<n> CLIP3R predicts dense point maps from overlapping clips while embedding object-level semantics.<n>2D-3D OVS lifts 2D features into 3D by learning fused descriptors integrating spatial, geometric, and semantic cues.
arXiv Detail & Related papers (2025-07-29T17:55:58Z)
Vector Representations of Vessel Trees [12.391128284848135]
We introduce a novel framework for learning vector representations of tree-structured geometric data focusing on 3D vascular networks.<n>Our framework, named VeTTA, offers precise, flexible, and topologically consistent modeling of anatomical tree structures in medical imaging.
arXiv Detail & Related papers (2025-06-11T20:34:08Z)
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding [58.38294408121273]
We propose Cross-modal and Uncertainty-aware Agglomeration for Open-vocabulary 3D Scene Understanding dubbed CUA-O3D.<n>Our method addresses two key challenges: (1) incorporating semantic priors from VLMs alongside the geometric knowledge of spatially-aware vision foundation models, and (2) using a novel deterministic uncertainty estimation to capture model-specific uncertainties.
arXiv Detail & Related papers (2025-03-20T20:58:48Z)
MedTet: An Online Motion Model for 4D Heart Reconstruction [59.74234226055964]
We present a novel approach to reconstruction of 3D cardiac motion from sparse intraoperative data.<n>Existing methods can accurately reconstruct 3D organ geometries from full 3D volumetric imaging.<n>We propose a versatile framework for reconstructing 3D motion from such partial data.
arXiv Detail & Related papers (2024-12-03T17:18:33Z)
Interpretable 2D Vision Models for 3D Medical Images [47.75089895500738]
This study proposes a simple approach of adapting 2D networks with an intermediate feature representation for processing 3D images. We show on all 3D MedMNIST datasets as benchmark and two real-world datasets consisting of several hundred high-resolution CT or MRI scans that our approach performs on par with existing methods.
arXiv Detail & Related papers (2023-07-13T08:27:09Z)
CORPS: Cost-free Rigorous Pseudo-labeling based on Similarity-ranking for Brain MRI Segmentation [3.1657395760137406]
We propose a semi-supervised segmentation framework built upon a novel atlas-based pseudo-labeling method and a 3D deep convolutional neural network (DCNN) for 3D brain MRI segmentation. The experimental results demonstrate the superiority of the proposed framework over the baseline method both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-05-19T14:42:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.