Bidirectional Mammogram View Translation with Column-Aware and Implicit 3D Conditional Diffusion
- URL: http://arxiv.org/abs/2510.04947v1
- Date: Mon, 06 Oct 2025 15:48:27 GMT
- Title: Bidirectional Mammogram View Translation with Column-Aware and Implicit 3D Conditional Diffusion
- Authors: Xin Li, Kaixiang Yang, Qiang Li, Zhiwei Wang,
- Abstract summary: View-to-view translation can help recover missing views and improve lesion alignment.<n>Unlike natural images, this task in mammography is highly challenging due to large non-rigid deformations and severe tissue overlap in X-ray projections.<n>We propose Column-Aware and Implicit 3D Diffusion (CA3D-Diff), a novel bidirectional mammogram view translation framework.
- Score: 17.309030641962
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dual-view mammography, including craniocaudal (CC) and mediolateral oblique (MLO) projections, offers complementary anatomical views crucial for breast cancer diagnosis. However, in real-world clinical workflows, one view may be missing, corrupted, or degraded due to acquisition errors or compression artifacts, limiting the effectiveness of downstream analysis. View-to-view translation can help recover missing views and improve lesion alignment. Unlike natural images, this task in mammography is highly challenging due to large non-rigid deformations and severe tissue overlap in X-ray projections, which obscure pixel-level correspondences. In this paper, we propose Column-Aware and Implicit 3D Diffusion (CA3D-Diff), a novel bidirectional mammogram view translation framework based on conditional diffusion model. To address cross-view structural misalignment, we first design a column-aware cross-attention mechanism that leverages the geometric property that anatomically corresponding regions tend to lie in similar column positions across views. A Gaussian-decayed bias is applied to emphasize local column-wise correlations while suppressing distant mismatches. Furthermore, we introduce an implicit 3D structure reconstruction module that back-projects noisy 2D latents into a coarse 3D feature volume based on breast-view projection geometry. The reconstructed 3D structure is refined and injected into the denoising UNet to guide cross-view generation with enhanced anatomical awareness. Extensive experiments demonstrate that CA3D-Diff achieves superior performance in bidirectional tasks, outperforming state-of-the-art methods in visual fidelity and structural consistency. Furthermore, the synthesized views effectively improve single-view malignancy classification in screening settings, demonstrating the practical value of our method in real-world diagnostics.
Related papers
- Preoperative-to-intraoperative Liver Registration for Laparoscopic Surgery via Latent-Grounded Correspondence Constraints [51.7011449975586]
Land-Reg is a deformable registration framework that learns latent-grounded 2D-3D landmark correspondences.<n>For rigid registration, Land-Reg embraces a Cross-modal Latent Alignment module.<n>An Uncertainty-enhanced Overlap Landmark Detector with similarity matching is proposed to robustly estimate explicit 2D-3D landmark correspondences.
arXiv Detail & Related papers (2026-03-02T10:44:03Z) - Extending 2D foundational DINOv3 representations to 3D segmentation of neonatal brain MR images [3.186130813218338]
The global MRI volume is decomposed into non-overlapping 3D windows or sub-cubes, each processed via a separate decoding arm built upon frozen high-fidelity features.<n>The proposed approach achieves a Dice score of 0.65 for a single 3D window.
arXiv Detail & Related papers (2026-02-27T12:16:21Z) - Multimodal Visual Surrogate Compression for Alzheimer's Disease Classification [69.87877580725768]
Multimodal Visual Surrogate Compression (MVSC) learns to compress and adapt large 3D sMRI volumes into compact 2D features.<n>MVSC has two key components: a Volume Context that captures global cross-slice context under textual guidance, and an Adaptive Slice Fusion module that aggregates slice-level information in a text-enhanced, patch-wise manner.
arXiv Detail & Related papers (2026-01-29T13:05:46Z) - Silhouette-to-Contour Registration: Aligning Intraoral Scan Models with Cephalometric Radiographs [10.70146635420186]
We propose DentalSCR, a pose-stable, contour-guided framework for accurate and interpretable silhouette-to-contour registration.<n>We evaluate DentalSCR on 34 expert-annotated clinical cases.
arXiv Detail & Related papers (2025-11-18T10:50:04Z) - Neural Image Unfolding: Flattening Sparse Anatomical Structures using Neural Fields [6.5082099033254135]
Tomographic imaging reveals internal structures of 3D objects and is crucial for medical diagnoses.<n>Various organ-specific unfolding techniques exist to map their densely sampled 3D surfaces to a distortion-minimized 2D representation.<n>We deploy a neural field to fit the transformation of the anatomy of interest to a 2D overview image.
arXiv Detail & Related papers (2024-11-27T14:58:49Z) - DuoLift-GAN:Reconstructing CT from Single-view and Biplanar X-Rays with Generative Adversarial Networks [1.3812010983144802]
We introduce DuoLift Generative Adversarial Networks (DuoLift-GAN), a novel architecture with dual branches that independently elevate 2D images and their features into 3D representations.<n>These 3D outputs are merged into a unified 3D feature map and decoded into a complete 3D chest volume, enabling richer 3D information capture.
arXiv Detail & Related papers (2024-11-12T17:11:18Z) - C^2RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction [17.54830070112685]
Cone beam computed tomography (CBCT) is an important imaging technology widely used in medical scenarios.
CBCT reconstruction is more challenging due to the increased dimensionality caused by the measurement process based on cone-shaped X-ray beams.
We propose C2RV by leveraging explicit multi-scale volumetric representations to enable cross-regional learning in the 3D space.
arXiv Detail & Related papers (2024-06-06T09:37:56Z) - GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.<n>Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z) - On the Localization of Ultrasound Image Slices within Point Distribution
Models [84.27083443424408]
Thyroid disorders are most commonly diagnosed using high-resolution Ultrasound (US)
Longitudinal tracking is a pivotal diagnostic protocol for monitoring changes in pathological thyroid morphology.
We present a framework for automated US image slice localization within a 3D shape representation.
arXiv Detail & Related papers (2023-09-01T10:10:46Z) - JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human
Mesh Recovery [84.67823511418334]
This paper presents 3D JOint contrastive learning with TRansformers framework for handling occluded 3D human mesh recovery.
Our method includes an encoder-decoder transformer architecture to fuse 2D and 3D representations for achieving 2D$&$3D aligned results.
arXiv Detail & Related papers (2023-07-31T02:58:58Z) - 3D Vessel Segmentation with Limited Guidance of 2D Structure-agnostic
Vessel Annotations [3.6314292723682784]
Supervised deep learning has demonstrated its superior capacity in automatic 3D vessel segmentation.
The reliance on expensive 3D manual annotations and limited capacity for annotation reuse hinder the clinical applications of supervised models.
This paper proposes a novel 3D shape-guided local discrimination model for 3D vascular segmentation under limited guidance from public 2D vessel annotations.
arXiv Detail & Related papers (2023-02-07T07:26:00Z) - Revisiting 3D Context Modeling with Supervised Pre-training for
Universal Lesion Detection in CT Slices [48.85784310158493]
We propose a Modified Pseudo-3D Feature Pyramid Network (MP3D FPN) to efficiently extract 3D context enhanced 2D features for universal lesion detection in CT slices.
With the novel pre-training method, the proposed MP3D FPN achieves state-of-the-art detection performance on the DeepLesion dataset.
The proposed 3D pre-trained weights can potentially be used to boost the performance of other 3D medical image analysis tasks.
arXiv Detail & Related papers (2020-12-16T07:11:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.