Auto-regressive transformation for image alignment
- URL: http://arxiv.org/abs/2505.04864v1
- Date: Thu, 08 May 2025 00:28:31 GMT
- Title: Auto-regressive transformation for image alignment
- Authors: Kanggeon Lee, Soochahn Lee, Kyoung Mu Lee,
- Abstract summary: Existing methods for image alignment struggle in cases involving feature-sparse regions, extreme scale and field-of-view differences, and large deformations.<n>We propose Auto-Regressive Transformation (ART), a novel method that iteratively estimates the coarse-to-fine transformations within an auto-regressive framework.<n>Our network refines the transformations using randomly sampled points at each scale.<n>By incorporating guidance from the cross-attention layer, the model focuses on critical regions, ensuring accurate alignment even in challenging, feature-limited conditions.
- Score: 46.12916700236777
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing methods for image alignment struggle in cases involving feature-sparse regions, extreme scale and field-of-view differences, and large deformations, often resulting in suboptimal accuracy. Robustness to these challenges improves through iterative refinement of the transformation field while focusing on critical regions in multi-scale image representations. We thus propose Auto-Regressive Transformation (ART), a novel method that iteratively estimates the coarse-to-fine transformations within an auto-regressive framework. Leveraging hierarchical multi-scale features, our network refines the transformations using randomly sampled points at each scale. By incorporating guidance from the cross-attention layer, the model focuses on critical regions, ensuring accurate alignment even in challenging, feature-limited conditions. Extensive experiments across diverse datasets demonstrate that ART significantly outperforms state-of-the-art methods, establishing it as a powerful new method for precise image alignment with broad applicability.
Related papers
- Feature Alignment with Equivariant Convolutions for Burst Image Super-Resolution [52.55429225242423]
We propose a novel framework for Burst Image Super-Resolution (BISR), featuring an equivariant convolution-based alignment.<n>This enables the alignment transformation to be learned via explicit supervision in the image domain and easily applied in the feature domain.<n>Experiments on BISR benchmarks show the superior performance of our approach in both quantitative metrics and visual quality.
arXiv Detail & Related papers (2025-03-11T11:13:10Z) - Spatially-Attentive Patch-Hierarchical Network with Adaptive Sampling
for Motion Deblurring [34.751361664891235]
We propose a pixel adaptive and feature attentive design for handling large blur variations across different spatial locations.
We show that our approach performs favorably against the state-of-the-art deblurring algorithms.
arXiv Detail & Related papers (2024-02-09T01:00:09Z) - Improving Diffusion-based Image Translation using Asymmetric Gradient
Guidance [51.188396199083336]
We present an approach that guides the reverse process of diffusion sampling by applying asymmetric gradient guidance.
Our model's adaptability allows it to be implemented with both image-fusion and latent-dif models.
Experiments show that our method outperforms various state-of-the-art models in image translation tasks.
arXiv Detail & Related papers (2023-06-07T12:56:56Z) - Real-World Image Variation by Aligning Diffusion Inversion Chain [53.772004619296794]
A domain gap exists between generated images and real-world images, which poses a challenge in generating high-quality variations of real-world images.
We propose a novel inference pipeline called Real-world Image Variation by ALignment (RIVAL)
Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain.
arXiv Detail & Related papers (2023-05-30T04:09:47Z) - Dual-Flow Transformation Network for Deformable Image Registration with
Region Consistency Constraint [95.30864269428808]
Current deep learning (DL)-based image registration approaches learn the spatial transformation from one image to another by leveraging a convolutional neural network.
We present a novel dual-flow transformation network with region consistency constraint which maximizes the similarity of ROIs within a pair of images.
Experiments on four public 3D MRI datasets show that the proposed method achieves the best registration performance in accuracy and generalization.
arXiv Detail & Related papers (2021-12-04T05:30:44Z) - Image Deformation Estimation via Multi-Objective Optimization [13.159751065619544]
Free-form deformation model can represent a wide range of non-rigid deformations by manipulating a control point lattice over the image.
It is challenging to fit the model directly to the deformed image for deformation estimation because of the complexity of the fitness landscape.
arXiv Detail & Related papers (2021-06-08T06:52:12Z) - LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution
Homography Estimation [52.63874513999119]
Cross-resolution image alignment is a key problem in multiscale giga photography.
Existing deep homography methods neglecting the explicit formulation of correspondences between them, which leads to degraded accuracy in cross-resolution challenges.
We propose a local transformer network embedded within a multiscale structure to explicitly learn correspondences between the multimodal inputs.
arXiv Detail & Related papers (2021-06-08T02:51:45Z) - Improving the generalization of network based relative pose regression:
dimension reduction as a regularizer [16.63174637692875]
State-of-the-art visual localization methods perform pose estimation using geometry based solver within the RANSAC framework.
End-to-end learning based regression networks provide a solution to circumvent the requirement for precise pixel-level correspondences.
In this paper, we explicitly add a learnable matching layer within the network to isolate the pose regression solver from the absolute image feature values.
We implement this dimension regularization strategy within a two-layer pyramid based framework to regress the localization results from coarse to fine.
arXiv Detail & Related papers (2020-10-24T06:20:46Z) - Transformation Consistency Regularization- A Semi-Supervised Paradigm
for Image-to-Image Translation [18.870983535180457]
We propose Transformation Consistency Regularization, which delves into a more challenging setting of image-to-image translation.
We evaluate the efficacy of our algorithm on three different applications: image colorization, denoising and super-resolution.
Our method is significantly data efficient, requiring only around 10 - 20% of labeled samples to achieve similar image reconstructions to its fully-supervised counterpart.
arXiv Detail & Related papers (2020-07-15T17:41:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.