Related papers: Cascaded Robust Rectification for Arbitrary Document Images

Cascaded Robust Rectification for Arbitrary Document Images

URL: http://arxiv.org/abs/2511.23150v1
Date: Fri, 28 Nov 2025 12:56:16 GMT
Title: Cascaded Robust Rectification for Arbitrary Document Images
Authors: Chaoyun Wang, Quanxin Huang, I-Chao Shen, Takeo Igarashi, Nanning Zheng, Caigui Jiang,
Abstract summary: Document rectification in real-world scenarios poses significant challenges due to extreme variations in camera perspectives and physical distortions.<n>We introduce a novel multi-stage framework that progressively reverses distinct distortion types in a coarse-to-fine manner.<n>Our framework first performs a global affine transformation to correct perspective distortions arising from the camera's viewpoint, then rectifies geometric deformations resulting from physical paper curling and folding, and finally employs a content-aware iterative process to eliminate fine-grained content distortions.
Score: 45.30113042855903
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Document rectification in real-world scenarios poses significant challenges due to extreme variations in camera perspectives and physical distortions. Driven by the insight that complex transformations can be decomposed and resolved progressively, we introduce a novel multi-stage framework that progressively reverses distinct distortion types in a coarse-to-fine manner. Specifically, our framework first performs a global affine transformation to correct perspective distortions arising from the camera's viewpoint, then rectifies geometric deformations resulting from physical paper curling and folding, and finally employs a content-aware iterative process to eliminate fine-grained content distortions. To address limitations in existing evaluation protocols, we also propose two enhanced metrics: layout-aligned OCR metrics (AED/ACER) for a stable assessment that decouples geometric rectification quality from the layout analysis errors of OCR engines, and masked AD/AAD (AD-M/AAD-M) tailored for accurately evaluating geometric distortions in documents with incomplete boundaries. Extensive experiments show that our method establishes new state-of-the-art performance on multiple challenging benchmarks, yielding a substantial reduction of 14.1\%--34.7\% in the AAD metric and demonstrating superior efficacy in real-world applications. The code will be publicly available at https://github.com/chaoyunwang/ArbDR.

Related papers

The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment [105.31858867473845]
ImageCritic can be integrated into an agent framework to automatically detect inconsistencies and correct them with multi-round and local editing.<n>In experiments, ImageCritic can effectively resolve detail-related issues in various customized generation scenarios, providing significant improvements over existing methods.
arXiv Detail & Related papers (2025-11-25T18:40:25Z)
Revisiting Reconstruction-based AI-generated Image Detection: A Geometric Perspective [50.83711509908479]
We introduce the Jacobian-Spectral Lower Bound for reconstruction error from a geometric perspective.<n>We show that real images off the reconstruction manifold exhibit a non-trivial error lower bound, while generated images on the manifold have near-zero error.<n>We propose ReGap, a training-free method that computes dynamic reconstruction error by leveraging structured editing operations.
arXiv Detail & Related papers (2025-10-29T03:45:03Z)
VGD: Visual Geometry Gaussian Splatting for Feed-Forward Surround-view Driving Reconstruction [26.668204454537246]
We introduce textbfVisual Gaussian Driving (VGD), a novel feed-forward end-to-end learning framework designed to address this challenge.<n>We show that our approach significantly outperforms state-of-the-art methods in both objective metrics and subjective quality under various settings.
arXiv Detail & Related papers (2025-10-22T13:28:49Z)
Towards Size-invariant Salient Object Detection: A Generic Evaluation and Optimization Approach [118.75896764188424]
We present a novel perspective to expose the inherent size sensitivity of existing widely used Salient Object Detection metrics.<n>To address this challenge, a generic Size-Invariant Evaluation (SIEva) framework is proposed.<n>We further develop a dedicated optimization framework (SIOpt), which adheres to the size-invariant principle and significantly enhances the detection of salient objects across a broad range of sizes.
arXiv Detail & Related papers (2025-09-19T04:12:14Z)
TADoc: Robust Time-Aware Document Image Dewarping [4.080803969466669]
Document image dewarping is an increasingly important task with the rise of digital economy and online working.<n>We reformulate this task, modeling it for the first time as a dynamic process that encompasses a series of intermediate states.<n>We design a lightweight framework called TADoc to address the geometric distortion of document images.
arXiv Detail & Related papers (2025-08-09T13:55:55Z)
Axis-Aligned Document Dewarping [39.058312371271825]
We introduce a new metric, Axis-Aligned Distortion (AAD), that incorporates geometric meaning and aligns with human visual perception.<n>Our method achieves SOTA results on multiple existing benchmarks and achieves 18.2%34.5% improvements on the AAD metric.
arXiv Detail & Related papers (2025-07-20T15:12:57Z)
Feature Alignment with Equivariant Convolutions for Burst Image Super-Resolution [52.55429225242423]
We propose a novel framework for Burst Image Super-Resolution (BISR), featuring an equivariant convolution-based alignment.<n>This enables the alignment transformation to be learned via explicit supervision in the image domain and easily applied in the feature domain.<n>Experiments on BISR benchmarks show the superior performance of our approach in both quantitative metrics and visual quality.
arXiv Detail & Related papers (2025-03-11T11:13:10Z)
Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object Detection [77.3530907443279]
We propose a novel self-supervised framework to detect objects in degraded low resolution images. Our methods has achieved superior performance compared with existing methods when facing variant degradation situations.
arXiv Detail & Related papers (2022-08-05T09:36:13Z)
Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network [30.18238229156996]
We propose a framework for both rectifying distorted document image and removing background finely, using a fully convolutional network (FCN) The FCN is trained by regressing displacements of synthesized distorted documents, and to control the smoothness of displacements, we propose a Local Smooth Constraint (LSC) in regularization. Experiments proved that our approach can dewarp document images effectively under various geometric distortions, and has achieved the state-of-the-art performance in terms of local details and overall effect.
arXiv Detail & Related papers (2021-04-14T12:32:36Z)
Can You Read Me Now? Content Aware Rectification using Angle Supervision [14.095728009592763]
We present CREASE: Content Aware Rectification using Angle Supervision, the first learned method for document rectification. Our method surpasses previous approaches in terms of OCR accuracy, geometric error and visual similarity.
arXiv Detail & Related papers (2020-08-05T16:58:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.