DMAligner: Enhancing Image Alignment via Diffusion Model Based View Synthesis
- URL: http://arxiv.org/abs/2602.23022v1
- Date: Thu, 26 Feb 2026 14:00:07 GMT
- Title: DMAligner: Enhancing Image Alignment via Diffusion Model Based View Synthesis
- Authors: Xinglong Luo, Ao Luo, Zhengning Wang, Yueqi Yang, Chaoyu Feng, Lei Lei, Bing Zeng, Shuaicheng Liu,
- Abstract summary: DMAligner is a diffusion-based framework for image alignment through alignment-oriented view synthesis.<n>We propose a Dynamics-aware Diffusion Training approach for learning conditional image generation.<n>We develop the Dynamic Scene Image Alignment (DSIA) dataset using Blender, which includes 1,033 indoor and outdoor scenes with over 30K image pairs tailored for image alignment.
- Score: 63.59932602411222
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image alignment is a fundamental task in computer vision with broad applications. Existing methods predominantly employ optical flow-based image warping. However, this technique is susceptible to common challenges such as occlusions and illumination variations, leading to degraded alignment visual quality and compromised accuracy in downstream tasks. In this paper, we present DMAligner, a diffusion-based framework for image alignment through alignment-oriented view synthesis. DMAligner is crafted to tackle the challenges in image alignment from a new perspective, employing a generation-based solution that showcases strong capabilities and avoids the problems associated with flow-based image warping. Specifically, we propose a Dynamics-aware Diffusion Training approach for learning conditional image generation, synthesizing a novel view for image alignment. This incorporates a Dynamics-aware Mask Producing (DMP) module to adaptively distinguish dynamic foreground regions from static backgrounds, enabling the diffusion model to more effectively handle challenges that classical methods struggle to solve. Furthermore, we develop the Dynamic Scene Image Alignment (DSIA) dataset using Blender, which includes 1,033 indoor and outdoor scenes with over 30K image pairs tailored for image alignment. Extensive experimental results demonstrate the superiority of the proposed approach on DSIA benchmarks, as well as on a series of widely-used video datasets for qualitative comparisons. Our code is available at https://github.com/boomluo02/DMAligner.
Related papers
- Learning Deblurring Texture Prior from Unpaired Data with Diffusion Model [92.61216319417208]
We propose a novel diffusion model (DM)-based framework, dubbed ours, for image deblurring.<n>ours performs DM to generate the prior knowledge that aids in recovering the textures of blurry images.<n>To fully exploit the generated texture priors, we present the Texture Transfer Transformer layer (TTformer)
arXiv Detail & Related papers (2025-07-18T01:50:31Z) - CDG-MAE: Learning Correspondences from Diffusion Generated Views [19.24402848656637]
CDG-MAE is a novel MAE-based self-supervised method that uses diverse synthetic views generated from static images.<n>These generated views exhibit substantial changes in pose and perspective, providing a rich training signal.
arXiv Detail & Related papers (2025-06-22T20:40:11Z) - STAY Diffusion: Styled Layout Diffusion Model for Diverse Layout-to-Image Generation [4.769823364778397]
We propose a diffusion-based model that produces photo-realistic images and provides fine-grained control of stylized objects in scenes.<n>Our approach learns a global condition for each layout, and a self-supervised semantic map for weight modulation.<n>A new Styled-Mask Attention (SM Attention) is also introduced to cross-condition the global condition and image feature for capturing the objects' relationships.
arXiv Detail & Related papers (2025-03-15T17:36:24Z) - DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling [6.7206291284535125]
We present an effective data augmentation framework leveraging the Large Language Model (LLM) and Diffusion Model (DM)
Our approach addresses the issue of increasing the diversity of synthetic images.
Our method produces synthetic images with enhanced diversity while maintaining adherence to the target distribution.
arXiv Detail & Related papers (2024-09-25T14:02:43Z) - Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment [20.902935570581207]
We introduce a Multimodal Alignment and Reconstruction Network (MARNet) to enhance the model's resistance to visual noise.
MARNet includes a cross-modal diffusion reconstruction module for smoothly and stably blending information across different domains.
Experiments conducted on two benchmark datasets, Vireo-Food172 and Ingredient-101, demonstrate that MARNet effectively improves the quality of image information extracted by the model.
arXiv Detail & Related papers (2024-07-26T16:30:18Z) - Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS)
A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - DGNet: Dynamic Gradient-Guided Network for Water-Related Optics Image
Enhancement [77.0360085530701]
Underwater image enhancement (UIE) is a challenging task due to the complex degradation caused by underwater environments.
Previous methods often idealize the degradation process, and neglect the impact of medium noise and object motion on the distribution of image features.
Our approach utilizes predicted images to dynamically update pseudo-labels, adding a dynamic gradient to optimize the network's gradient space.
arXiv Detail & Related papers (2023-12-12T06:07:21Z) - Semantic Image Synthesis via Diffusion Models [174.24523061460704]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.<n>Recent work on semantic image synthesis mainly follows the de facto GAN-based approaches.<n>We propose a novel framework based on DDPM for semantic image synthesis.
arXiv Detail & Related papers (2022-06-30T18:31:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.