Rectification Reimagined: A Unified Mamba Model for Image Correction and Rectangling with Prompts
- URL: http://arxiv.org/abs/2512.18718v1
- Date: Sun, 21 Dec 2025 12:33:44 GMT
- Title: Rectification Reimagined: A Unified Mamba Model for Image Correction and Rectangling with Prompts
- Authors: Linwei Qiu, Gongzhe Li, Xiaozhe Zhang, Qinlin Sun, Fengying Xie,
- Abstract summary: We introduce the Unified Rectification Framework (UniRect), a comprehensive approach that addresses these practical tasks from a consistent distortion rectification perspective.<n>Our approach incorporates various task-specific inverse problems into a general distortion model by simulating different types of lenses.<n>Our models have achieved state-of-the-art performance compared with other up-to-date methods.
- Score: 7.136884388888679
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image correction and rectangling are valuable tasks in practical photography systems such as smartphones. Recent remarkable advancements in deep learning have undeniably brought about substantial performance improvements in these fields. Nevertheless, existing methods mainly rely on task-specific architectures. This significantly restricts their generalization ability and effective application across a wide range of different tasks. In this paper, we introduce the Unified Rectification Framework (UniRect), a comprehensive approach that addresses these practical tasks from a consistent distortion rectification perspective. Our approach incorporates various task-specific inverse problems into a general distortion model by simulating different types of lenses. To handle diverse distortions, UniRect adopts one task-agnostic rectification framework with a dual-component structure: a {Deformation Module}, which utilizes a novel Residual Progressive Thin-Plate Spline (RP-TPS) model to address complex geometric deformations, and a subsequent Restoration Module, which employs Residual Mamba Blocks (RMBs) to counteract the degradation caused by the deformation process and enhance the fidelity of the output image. Moreover, a Sparse Mixture-of-Experts (SMoEs) structure is designed to circumvent heavy task competition in multi-task learning due to varying distortions. Extensive experiments demonstrate that our models have achieved state-of-the-art performance compared with other up-to-date methods.
Related papers
- Physically Interpretable Multi-Degradation Image Restoration via Deep Unfolding and Explainable Convolution [45.571542528079114]
We propose a novel interpretability-driven approach for multi-degradation image restoration.<n>We employ an improved second-order semi-smooth Newton algorithm to ensure that each module maintains clear physical interpretability.<n>To further enhance interpretability and adaptability, we design an explainable convolution module inspired by the human brain's flexible information processing.
arXiv Detail & Related papers (2025-11-13T10:27:41Z) - MambaStyle: Efficient StyleGAN Inversion for Real Image Editing with State-Space Models [60.110274007388135]
MambaStyle is an efficient single-stage encoder-based approach for GAN inversion and editing.<n>We show that MambaStyle achieves a superior balance among inversion accuracy, editing quality, and computational efficiency.
arXiv Detail & Related papers (2025-05-06T20:03:47Z) - Marmot: Object-Level Self-Correction via Multi-Agent Reasoning [55.74860093731475]
Marmot is a novel and generalizable framework that leverages Multi-Agent Reasoning for Multi-Object Self-Correcting.<n>Marmot significantly improves accuracy in object counting, attribute assignment, and spatial relationships for image generation tasks.
arXiv Detail & Related papers (2025-04-10T16:54:28Z) - Feature Alignment with Equivariant Convolutions for Burst Image Super-Resolution [52.55429225242423]
We propose a novel framework for Burst Image Super-Resolution (BISR), featuring an equivariant convolution-based alignment.<n>This enables the alignment transformation to be learned via explicit supervision in the image domain and easily applied in the feature domain.<n>Experiments on BISR benchmarks show the superior performance of our approach in both quantitative metrics and visual quality.
arXiv Detail & Related papers (2025-03-11T11:13:10Z) - Towards Enhanced Image Generation Via Multi-modal Chain of Thought in Unified Generative Models [52.84391764467939]
Unified generative models have shown remarkable performance in text and image generation.<n>We introduce Chain of Thought (CoT) into unified generative models to address the challenges of complex image generation.<n>Experiments show that FoX consistently outperforms existing unified models on various T2I benchmarks.
arXiv Detail & Related papers (2025-03-03T08:36:16Z) - Adaptive Blind All-in-One Image Restoration [15.726917603679716]
Blind all-in-one image restoration models aim to recover a high-quality image from an input degraded with unknown distortions.<n>We introduce ABAIR, a simple yet effective adaptive blind all-in-one restoration model that handles multiple degradations and generalizes well to unseen distortions.<n>Our model not only surpasses state-of-the-art performance on five- and three-task IR setups but also demonstrates superior generalization to unseen degradations and composite distortions.
arXiv Detail & Related papers (2024-11-27T14:58:08Z) - A Unified Deep Learning Framework for Motion Correction in Medical Imaging [6.727558990042319]
We introduce UniMo, a Unified Motion Correction framework to correct diverse motion in medical imaging.<n>UniMo employs an alternating optimization scheme for a unified loss function to train an integrated model of 1) an equivariant neural network for global motion correction and 2) an encoder-decoder network for local deformations.<n>We trained and tested UniMo to track motion in fetal magnetic resonance imaging, a challenging application due to 1) both large rigid and non-rigid motion, and 2) wide variations in image appearance.
arXiv Detail & Related papers (2024-09-21T17:36:11Z) - Multi-task Image Restoration Guided By Robust DINO Features [88.74005987908443]
We propose mboxtextbfDINO-IR, a multi-task image restoration approach leveraging robust features extracted from DINOv2.
We first propose a pixel-semantic fusion (PSF) module to dynamically fuse DINOV2's shallow features.
By formulating these modules into a unified deep model, we propose a DINO perception contrastive loss to constrain the model training.
arXiv Detail & Related papers (2023-12-04T06:59:55Z) - Stochastic Planner-Actor-Critic for Unsupervised Deformable Image
Registration [33.72954116727303]
We present a novel reinforcement learning-based framework that performs step-wise registration of medical images with large deformations.
We evaluate our method on several 2D and 3D medical image datasets, some of which contain large deformations.
arXiv Detail & Related papers (2021-12-14T14:08:56Z) - Image Deformation Estimation via Multi-Objective Optimization [13.159751065619544]
Free-form deformation model can represent a wide range of non-rigid deformations by manipulating a control point lattice over the image.
It is challenging to fit the model directly to the deformed image for deformation estimation because of the complexity of the fitness landscape.
arXiv Detail & Related papers (2021-06-08T06:52:12Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.