Related papers: Revisiting 2D Foundation Models for Scalable 3D Medical Image Classification

Revisiting 2D Foundation Models for Scalable 3D Medical Image Classification

URL: http://arxiv.org/abs/2512.12887v1
Date: Mon, 15 Dec 2025 00:01:19 GMT
Title: Revisiting 2D Foundation Models for Scalable 3D Medical Image Classification
Authors: Han Liu, Bogdan Georgescu, Yanbo Zhang, Youngjin Yoo, Michael Baumgartner, Riqiang Gao, Jianing Wang, Gengyan Zhao, Eli Gibson, Dorin Comaniciu, Sasa Grbic,
Abstract summary: We introduce AnyMC3D, a scalable 3D classifier adapted from 2D FMs.<n>Our method scales efficiently to new tasks by adding only lightweight plugins on top of a single frozen backbone.<n>Our analysis reveals key insights: (1) effective adaptation is essential to unlock FM potential, (2) general-purpose FMs can match medical-specific FMs if properly adapted, and (3) 2D-based methods surpass 3D architectures for 3D classification.
Score: 11.13919196108179
License: http://creativecommons.org/licenses/by/4.0/
Abstract: 3D medical image classification is essential for modern clinical workflows. Medical foundation models (FMs) have emerged as a promising approach for scaling to new tasks, yet current research suffers from three critical pitfalls: data-regime bias, suboptimal adaptation, and insufficient task coverage. In this paper, we address these pitfalls and introduce AnyMC3D, a scalable 3D classifier adapted from 2D FMs. Our method scales efficiently to new tasks by adding only lightweight plugins (about 1M parameters per task) on top of a single frozen backbone. This versatile framework also supports multi-view inputs, auxiliary pixel-level supervision, and interpretable heatmap generation. We establish a comprehensive benchmark of 12 tasks covering diverse pathologies, anatomies, and modalities, and systematically analyze state-of-the-art 3D classification techniques. Our analysis reveals key insights: (1) effective adaptation is essential to unlock FM potential, (2) general-purpose FMs can match medical-specific FMs if properly adapted, and (3) 2D-based methods surpass 3D architectures for 3D classification. For the first time, we demonstrate the feasibility of achieving state-of-the-art performance across diverse applications using a single scalable framework (including 1st place in the VLM3D challenge), eliminating the need for separate task-specific models.

Related papers

Multimodal Visual Surrogate Compression for Alzheimer's Disease Classification [69.87877580725768]
Multimodal Visual Surrogate Compression (MVSC) learns to compress and adapt large 3D sMRI volumes into compact 2D features.<n>MVSC has two key components: a Volume Context that captures global cross-slice context under textual guidance, and an Adaptive Slice Fusion module that aggregates slice-level information in a text-enhanced, patch-wise manner.
arXiv Detail & Related papers (2026-01-29T13:05:46Z)
TAP-CT: 3D Task-Agnostic Pretraining of Computed Tomography Foundation Models [39.00742360251856]
Existing foundation models (FMs) in the medical domain often require extensive fine-tuning or rely on training resource-intensive decoders.<n>We introduce a suite of task-agnostic pretraining of CT foundation models (TAP-CT)<n>Our approach incorporates targeted modifications to patch embeddings, positional encodings, and volumetric augmentations, making the architecture depth-aware.
arXiv Detail & Related papers (2025-11-30T12:43:15Z)
MTMed3D: A Multi-Task Transformer-Based Model for 3D Medical Imaging [5.169719124205838]
We propose MTMed3D, a novel end-to-end Multi-task Transformer-based model to address the limitations of single-task models.<n>Our model uses a Transformer as the shared encoder to generate multi-scale features, followed by CNN-based task-specific decoders.
arXiv Detail & Related papers (2025-11-15T22:27:49Z)
MAFM^3: Modular Adaptation of Foundation Models for Multi-Modal Medical AI [3.1920084309415007]
We propose MAFM3, a framework that enables a single foundation model to expand into diverse domains, tasks, and modalities.<n>Unlike conventional adaptation methods that treat each new task or modality in isolation, MAFM3 provides a unified and expandable framework for efficient multitask and multimodality adaptation.
arXiv Detail & Related papers (2025-11-14T12:10:59Z)
Does DINOv3 Set a New Medical Vision Standard? [67.33543059306938]
This report investigates whether DINOv3 can serve as a powerful unified encoder for medical vision tasks without domain-specific pre-training.<n>We benchmark DINOv3 across common medical vision tasks, including 2D/3D classification and segmentation.<n>Remarkably, it can even outperform medical-specific foundation models like BiomedCLIP and CT-Net on several tasks.
arXiv Detail & Related papers (2025-09-08T09:28:57Z)
MedSeqFT: Sequential Fine-tuning Foundation Models for 3D Medical Image Segmentation [55.37355146924576]
MedSeqFT is a sequential fine-tuning framework for medical image analysis.<n>It adapts pre-trained models to new tasks while refining their representational capacity.<n>It consistently outperforms state-of-the-art fine-tuning strategies.
arXiv Detail & Related papers (2025-09-07T15:22:53Z)
Foundational Models for 3D Point Clouds: A Survey and Outlook [50.61473863985571]
3D point cloud representation plays a crucial role in preserving the geometric fidelity of the physical world.<n>To bridge this gap, it becomes essential to incorporate multiple modalities.<n>Foundation models (FMs) can seamlessly integrate and reason across these modalities.
arXiv Detail & Related papers (2025-01-30T18:59:43Z)
Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model [17.69323209661274]
We propose Med-2E3, a 3D medical MLLM that integrates a dual 3D-2D encoder architecture.<n>To aggregate 2D features effectively, we design a Text-Guided Inter-Slice (TG-IS) scoring module.<n>Experiments on large-scale, open-source 3D medical multimodal datasets demonstrate that TG-IS exhibits task-specific attention distribution.
arXiv Detail & Related papers (2024-11-19T09:59:59Z)
Cross-D Conv: Cross-Dimensional Transferable Knowledge Base via Fourier Shifting Operation [3.69758875412828]
Cross-D Conv operation bridges the dimensional gap by learning the phase shifting in the Fourier domain.<n>Our method enables seamless weight transfer between 2D and 3D convolution operations, effectively facilitating cross-dimensional learning.
arXiv Detail & Related papers (2024-11-02T13:03:44Z)
Promise:Prompt-driven 3D Medical Image Segmentation Using Pretrained Image Foundation Models [13.08275555017179]
We propose ProMISe, a prompt-driven 3D medical image segmentation model using only a single point prompt. We evaluate our model on two public datasets for colon and pancreas tumor segmentations.
arXiv Detail & Related papers (2023-10-30T16:49:03Z)
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm [111.16358607889609]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.<n>For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z)
Interpretable 2D Vision Models for 3D Medical Images [47.75089895500738]
This study proposes a simple approach of adapting 2D networks with an intermediate feature representation for processing 3D images. We show on all 3D MedMNIST datasets as benchmark and two real-world datasets consisting of several hundred high-resolution CT or MRI scans that our approach performs on par with existing methods.
arXiv Detail & Related papers (2023-07-13T08:27:09Z)
Video Pretraining Advances 3D Deep Learning on Chest CT Tasks [63.879848037679224]
Pretraining on large natural image classification datasets has aided model development on data-scarce 2D medical tasks. These 2D models have been surpassed by 3D models on 3D computer vision benchmarks. We show video pretraining for 3D models can enable higher performance on smaller datasets for 3D medical tasks.
arXiv Detail & Related papers (2023-04-02T14:46:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.