MedVAR: Towards Scalable and Efficient Medical Image Generation via Next-scale Autoregressive Prediction
- URL: http://arxiv.org/abs/2602.14512v2
- Date: Mon, 23 Feb 2026 10:51:34 GMT
- Title: MedVAR: Towards Scalable and Efficient Medical Image Generation via Next-scale Autoregressive Prediction
- Authors: Zhicheng He, Yunpeng Zhao, Junde Wu, Ziwei Niu, Zijun Li, Bohan Li, Lanfen Lin, Yueming Jin,
- Abstract summary: A scalable generative backbone for medical imaging requires architectural efficiency, sufficient multi-organ data, and principled evaluation.<n>We introduce Med VAR, the first autoregressive-based foundation model that adopts the next-scale prediction paradigm to enable fast and scale-up-friendly medical image synthesis.<n>To support hierarchical generation, we curate a harmonized dataset of around 440,000 CT and MRI images spanning six anatomical regions.
- Score: 27.388534881299393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Medical image generation is pivotal in applications like data augmentation for low-resource clinical tasks and privacy-preserving data sharing. However, developing a scalable generative backbone for medical imaging requires architectural efficiency, sufficient multi-organ data, and principled evaluation, yet current approaches leave these aspects unresolved. Therefore, we introduce MedVAR, the first autoregressive-based foundation model that adopts the next-scale prediction paradigm to enable fast and scale-up-friendly medical image synthesis. MedVAR generates images in a coarse-to-fine manner and produces structured multi-scale representations suitable for downstream use. To support hierarchical generation, we curate a harmonized dataset of around 440,000 CT and MRI images spanning six anatomical regions. Comprehensive experiments across fidelity, diversity, and scalability show that MedVAR achieves state-of-the-art generative performance and offers a promising architectural direction for future medical generative foundation models.
Related papers
- MedDChest: A Content-Aware Multimodal Foundational Vision Model for Thoracic Imaging [3.0332210076508326]
We propose MedDChest, a new foundational Vision Transformer (ViT) model optimized specifically for thoracic imaging.<n>We pre-trained MedDChest from scratch on a massive, curated, multimodal dataset of over 1.2 million images.<n>We validate our model's effectiveness by fine-tuning it on a diverse set of downstream diagnostic tasks.
arXiv Detail & Related papers (2025-11-06T03:28:56Z) - Medverse: A Universal Model for Full-Resolution 3D Medical Image Segmentation, Transformation and Enhancement [15.28003304776022]
In-context learning offers a promising paradigm for universal medical image analysis.<n>We present textbfMedverse, a universal ICL model for 3D medical imaging trained on 22 datasets.<n>Medverse employs a next-scale autoregressive in-context learning framework that progressively refines predictions from coarse to fine.
arXiv Detail & Related papers (2025-09-11T08:10:49Z) - MedIL: Implicit Latent Spaces for Generating Heterogeneous Medical Images at Arbitrary Resolutions [2.2427832125073732]
MedIL is a first-of-its-kind autoencoder built for encoding medical images with heterogeneous sizes and resolutions.<n>We show how MedIL compresses and preserves clinically-relevant features over large multi-site, multi-resolution datasets.
arXiv Detail & Related papers (2025-04-12T19:52:56Z) - MedLoRD: A Medical Low-Resource Diffusion Model for High-Resolution 3D CT Image Synthesis [1.1741781892171472]
We introduce MedLoRD, a generative diffusion model designed for computational resource-constrained environments.<n>MedLoRD is capable of generating high-dimensional medical volumes with resolutions up to 512$times$512$times$256.<n>It is evaluated across multiple modalities, including Coronary Computed Tomography Angiography and Lung Computed Tomography datasets.
arXiv Detail & Related papers (2025-03-17T14:22:49Z) - RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining [64.66825253356869]
We propose a novel methodology that leverages dense radiology reports to define image-wise similarity ordering at multiple granularities.<n>We construct two comprehensive medical imaging retrieval datasets: MIMIC-IR for Chest X-rays and CTRATE-IR for CT scans.<n>We develop two retrieval systems, RadIR-CXR and model-ChestCT, which demonstrate superior performance in traditional image-image and image-report retrieval tasks.
arXiv Detail & Related papers (2025-03-06T17:43:03Z) - Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis [55.959002385347645]
Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation.<n>We evaluate our method on three public longitudinal benchmark datasets of brain MRI and chest X-rays for counterfactual image generation.
arXiv Detail & Related papers (2024-12-30T01:59:34Z) - HyperFusion: A Hypernetwork Approach to Multimodal Integration of Tabular and Medical Imaging Data for Predictive Modeling [4.44283662576491]
We present a novel framework based on hypernetworks to fuse clinical imaging and tabular data by conditioning the image processing on the EHR's values and measurements.<n>This approach aims to leverage the complementary information present in these modalities to enhance the accuracy of various medical applications.
arXiv Detail & Related papers (2024-03-20T05:50:04Z) - Building Universal Foundation Models for Medical Image Analysis with
Spatially Adaptive Networks [5.661631789478932]
We propose a universal foundation model for medical image analysis that processes images with heterogeneous spatial properties using a unified structure.
We pre-train a spatial adaptive visual tokenizer (SPAD-VT) and then a spatial adaptive Vision Transformer (SPAD-ViT) via masked image modeling (MIM) on 55 public medical image datasets.
The experimental results on downstream medical image classification and segmentation tasks demonstrate the superior performance and label efficiency of our model.
arXiv Detail & Related papers (2023-12-12T08:33:45Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Customizing General-Purpose Foundation Models for Medical Report
Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks.
We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z) - MedSegDiff-V2: Diffusion based Medical Image Segmentation with
Transformer [53.575573940055335]
We propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2.
We verify its effectiveness on 20 medical image segmentation tasks with different image modalities.
arXiv Detail & Related papers (2023-01-19T03:42:36Z) - Medical Transformer: Gated Axial-Attention for Medical Image
Segmentation [73.98974074534497]
We study the feasibility of using Transformer-based network architectures for medical image segmentation tasks.
We propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module.
To train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance.
arXiv Detail & Related papers (2021-02-21T18:35:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.