Efficient and robust 3D blind harmonization for large domain gaps
- URL: http://arxiv.org/abs/2505.00133v1
- Date: Wed, 30 Apr 2025 19:00:58 GMT
- Title: Efficient and robust 3D blind harmonization for large domain gaps
- Authors: Hwihun Jeong, Hayeon Lee, Se Young Chun, Jongho Lee,
- Abstract summary: We introduce BlindHarmonyDiff, a novel blind 3D harmonization framework.<n>Our framework employs a 3D rectified flow trained on target domain images to reconstruct the original image from an edge map, then yielding a harmonized image from the edge of a source domain image.
- Score: 16.11365154990601
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Blind harmonization has emerged as a promising technique for MR image harmonization to achieve scale-invariant representations, requiring only target domain data (i.e., no source domain data necessary). However, existing methods face limitations such as inter-slice heterogeneity in 3D, moderate image quality, and limited performance for a large domain gap. To address these challenges, we introduce BlindHarmonyDiff, a novel blind 3D harmonization framework that leverages an edge-to-image model tailored specifically to harmonization. Our framework employs a 3D rectified flow trained on target domain images to reconstruct the original image from an edge map, then yielding a harmonized image from the edge of a source domain image. We propose multi-stride patch training for efficient 3D training and a refinement module for robust inference by suppressing hallucination. Extensive experiments demonstrate that BlindHarmonyDiff outperforms prior arts by harmonizing diverse source domain images to the target domain, achieving higher correspondence to the target domain characteristics. Downstream task-based quality assessments such as tissue segmentation and age prediction on diverse MR scanners further confirm the effectiveness of our approach and demonstrate the capability of our robust and generalizable blind harmonization.
Related papers
- GCA-3D: Towards Generalized and Consistent Domain Adaptation of 3D Generators [24.67369444661137]
GCA-3D is a generalized and consistent 3D domain adaptation method without the intricate pipeline of data generation.<n>We introduce multi-modal depth-aware score distillation sampling loss to efficiently adapt 3D generative models in a non-adversarial manner.<n>Experiments demonstrate that GCA-3D outperforms previous methods in terms of efficiency, generalization, pose accuracy, and identity consistency.
arXiv Detail & Related papers (2024-12-20T02:13:11Z) - Visual Foundation Models Boost Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation [17.875516787157018]
We study how to harness the knowledge priors learned by 2D visual foundation models to produce more accurate labels for unlabeled target domains.
Our method is evaluated on various autonomous driving datasets and the results demonstrate a significant improvement for 3D segmentation task.
arXiv Detail & Related papers (2024-03-15T03:58:17Z) - CMDA: Cross-Modal and Domain Adversarial Adaptation for LiDAR-Based 3D
Object Detection [14.063365469339812]
LiDAR-based 3D Object Detection methods often do not generalize well to target domains outside the source (or training) data distribution.
We introduce a novel unsupervised domain adaptation (UDA) method, called CMDA, which leverages visual semantic cues from an image modality.
We also introduce a self-training-based learning strategy, wherein a model is adversarially trained to generate domain-invariant features.
arXiv Detail & Related papers (2024-03-06T14:12:38Z) - Unified Frequency-Assisted Transformer Framework for Detecting and
Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem.
By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts.
Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z) - IT3D: Improved Text-to-3D Generation with Explicit View Synthesis [71.68595192524843]
This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues.
Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images.
For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data.
arXiv Detail & Related papers (2023-08-22T14:39:17Z) - BlindHarmony: "Blind" Harmonization for MR Images via Flow model [1.765282368080009]
In MRI, images of the same contrast from the same subject can exhibit noticeable differences when acquired using different hardware, sequences, or scan parameters.
These differences create a domain gap that needs to be bridged by image harmonization.
Deep learning-based approaches have been proposed to achieve image harmonization.
We propose BlindHarmony, which utilizes only target domain data for training but still has the capability to harmonize images from unseen domains.
arXiv Detail & Related papers (2023-05-18T06:04:24Z) - Domain Generalisation via Domain Adaptation: An Adversarial Fourier
Amplitude Approach [13.642506915023871]
We adversarially synthesise the worst-case target domain and adapt a model to that worst-case domain.
On the DomainBedNet dataset, the proposed approach yields significantly improved domain generalisation performance.
arXiv Detail & Related papers (2023-02-23T14:19:07Z) - Semantic Image Synthesis via Diffusion Models [174.24523061460704]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.<n>Recent work on semantic image synthesis mainly follows the de facto GAN-based approaches.<n>We propose a novel framework based on DDPM for semantic image synthesis.
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - RiCS: A 2D Self-Occlusion Map for Harmonizing Volumetric Objects [68.85305626324694]
Ray-marching in Camera Space (RiCS) is a new method to represent the self-occlusions of foreground objects in 3D into a 2D self-occlusion map.
We show that our representation map not only allows us to enhance the image quality but also to model temporally coherent complex shadow effects.
arXiv Detail & Related papers (2022-05-14T05:35:35Z) - Unsupervised Domain Adaptation for Monocular 3D Object Detection via
Self-Training [57.25828870799331]
We propose STMono3D, a new self-teaching framework for unsupervised domain adaptation on Mono3D.
We develop a teacher-student paradigm to generate adaptive pseudo labels on the target domain.
STMono3D achieves remarkable performance on all evaluated datasets and even surpasses fully supervised results on the KITTI 3D object detection dataset.
arXiv Detail & Related papers (2022-04-25T12:23:07Z) - Unsupervised Domain Adaptation with Contrastive Learning for OCT
Segmentation [49.59567529191423]
We propose a novel semi-supervised learning framework for segmentation of volumetric images from new unlabeled domains.
We jointly use supervised and contrastive learning, also introducing a contrastive pairing scheme that leverages similarity between nearby slices in 3D.
arXiv Detail & Related papers (2022-03-07T19:02:26Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.