Related papers: MedDIFT: Multi-Scale Diffusion-Based Correspondence in 3D Medical Imaging

MedDIFT: Multi-Scale Diffusion-Based Correspondence in 3D Medical Imaging

URL: http://arxiv.org/abs/2512.05571v1
Date: Fri, 05 Dec 2025 09:53:07 GMT
Title: MedDIFT: Multi-Scale Diffusion-Based Correspondence in 3D Medical Imaging
Authors: Xingyu Zhang, Anna Reithmeir, Fryderyk Kögl, Rickmer Braren, Julia A. Schnabel, Daniel M. Lang,
Abstract summary: We present MedDIFT, a training-free 3D correspondence framework that leverages multi-scale features from a pretrained latent medical diffusion model as voxel descriptors.<n>On a publicly available lung CT dataset, MedDIFT achieves correspondence accuracy comparable to the state-of-the-art UniGradICON model.
Score: 6.520674045578402
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate spatial correspondence between medical images is essential for longitudinal analysis, lesion tracking, and image-guided interventions. Medical image registration methods rely on local intensity-based similarity measures, which fail to capture global semantic structure and often yield mismatches in low-contrast or anatomically variable regions. Recent advances in diffusion models suggest that their intermediate representations encode rich geometric and semantic information. We present MedDIFT, a training-free 3D correspondence framework that leverages multi-scale features from a pretrained latent medical diffusion model as voxel descriptors. MedDIFT fuses diffusion activations into rich voxel-wise descriptors and matches them via cosine similarity, with an optional local-search prior. On a publicly available lung CT dataset, MedDIFT achieves correspondence accuracy comparable to the state-of-the-art learning-based UniGradICON model and surpasses conventional B-spline-based registration, without requiring any task-specific model training. Ablation experiments confirm that multi-level feature fusion and modest diffusion noise improve performance.

Related papers

MedCondDiff: Lightweight, Robust, Semantically Guided Diffusion for Medical Image Segmentation [5.838464931565891]
We introduce MedCondDiff, a diffusion-based framework for medical image segmentation.<n>The model conditions the denoising process on semantic priors extracted by a Pyramid Vision Transformer (PVT) backbone.<n>This design improves robustness while reducing both inference time and VRAM usage.
arXiv Detail & Related papers (2025-11-29T06:43:15Z)
SimCroP: Radiograph Representation Learning with Similarity-driven Cross-granularity Pre-training [25.763109982379703]
We propose a Similarity-Driven Cross-Granularity Pre-training framework on chest CTs.<n>It combines similarity-driven alignment and cross-granularity fusion to improve radiograph interpretation.<n>SimCroP is pre-trained on a large-scale paired CT-reports dataset and validated on image classification and segmentation tasks.
arXiv Detail & Related papers (2025-09-10T06:20:53Z)
FoundDiff: Foundational Diffusion Model for Generalizable Low-Dose CT Denoising [55.04342933312839]
We propose FoundDiff, a foundational diffusion model for unified and generalizable low-dose computed tomography (CT) denoising.<n>FoundDiff employs a two-stage strategy: (i) dose-anatomy perception and (ii) adaptive denoising.<n>First, we develop a dose- and anatomy-aware contrastive language image pre-training model (DA-CLIP) to achieve robust dose and anatomy perception.<n>Second, we design a dose- and anatomy-aware diffusion model (DA-Diff) to perform adaptive and generalizable denoising.
arXiv Detail & Related papers (2025-08-24T11:03:56Z)
UniSegDiff: Boosting Unified Lesion Segmentation via a Staged Diffusion Model [53.34835793648352]
We propose UniSegDiff, a novel diffusion model framework for lesion segmentation.<n>UniSegDiff addresses lesion segmentation in a unified manner across multiple modalities and organs.<n> Comprehensive experimental results demonstrate that UniSegDiff significantly outperforms previous state-of-the-art (SOTA) approaches.
arXiv Detail & Related papers (2025-07-24T12:33:10Z)
Meta-Entity Driven Triplet Mining for Aligning Medical Vision-Language Models [9.76070837929117]
Existing alignment methods prioritize separation between disease classes over segregation of fine-grained pathology attributes.<n>Here, we propose MedTrim, a novel method that enhances image-text alignment through multimodal triplet learning.<n>Our demonstrations indicate that MedTrim improves performance in downstream retrieval and classification tasks compared to state-of-the-art alignment methods.
arXiv Detail & Related papers (2025-04-22T14:17:51Z)
Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis [55.959002385347645]
Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation.<n>We evaluate our method on three public longitudinal benchmark datasets of brain MRI and chest X-rays for counterfactual image generation.
arXiv Detail & Related papers (2024-12-30T01:59:34Z)
Similarity-aware Syncretic Latent Diffusion Model for Medical Image Translation with Representation Learning [15.234393268111845]
Non-contrast CT (NCCT) imaging may reduce image contrast and anatomical visibility, potentially increasing diagnostic uncertainty. We propose a novel Syncretic generative model based on the latent diffusion model for medical image translation (S$2$LDM) S$2$LDM enhances the similarity in distinct modal images via syncretic encoding and diffusing, promoting amalgamated information in the latent space and generating medical images with more details in contrast-enhanced regions.
arXiv Detail & Related papers (2024-06-20T03:54:41Z)
Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training [99.2891802841936]
We introduce the Med-ST framework for fine-grained spatial and temporal modeling. For spatial modeling, Med-ST employs the Mixture of View Expert (MoVE) architecture to integrate different visual features from both frontal and lateral views. For temporal modeling, we propose a novel cross-modal bidirectional cycle consistency objective by forward mapping classification (FMC) and reverse mapping regression (RMR)
arXiv Detail & Related papers (2024-05-30T03:15:09Z)
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection. Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels. Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z)
Diff-UNet: A Diffusion Embedded Network for Volumetric Segmentation [41.608617301275935]
We propose a novel end-to-end framework, called Diff-UNet, for medical volumetric segmentation. Our approach integrates the diffusion model into a standard U-shaped architecture to extract semantic information from the input volume effectively. We evaluate our method on three datasets, including multimodal brain tumors in MRI, liver tumors, and multi-organ CT volumes.
arXiv Detail & Related papers (2023-03-18T04:06:18Z)
Few-shot Medical Image Segmentation using a Global Correlation Network with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation. We construct our few-shot image segmentor using a deep convolutional network trained episodically. We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.