Related papers: Axis-Aligned Document Dewarping

Axis-Aligned Document Dewarping

URL: http://arxiv.org/abs/2507.15000v1
Date: Sun, 20 Jul 2025 15:12:57 GMT
Title: Axis-Aligned Document Dewarping
Authors: Chaoyun Wang, I-Chao Shen, Takeo Igarashi, Nanning Zheng, Caigui Jiang,
Abstract summary: We introduce a new metric, Axis-Aligned Distortion (AAD), that incorporates geometric meaning and aligns with human visual perception.<n>Our method achieves SOTA results on multiple existing benchmarks and achieves 18.2%34.5% improvements on the AAD metric.
Score: 39.058312371271825
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Document dewarping is crucial for many applications. However, existing learning-based methods primarily rely on supervised regression with annotated data without leveraging the inherent geometric properties in physical documents to the dewarping process. Our key insight is that a well-dewarped document is characterized by transforming distorted feature lines into axis-aligned ones. This property aligns with the inherent axis-aligned nature of the discrete grid geometry in planar documents. In the training phase, we propose an axis-aligned geometric constraint to enhance document dewarping. In the inference phase, we propose an axis alignment preprocessing strategy to reduce the dewarping difficulty. In the evaluation phase, we introduce a new metric, Axis-Aligned Distortion (AAD), that not only incorporates geometric meaning and aligns with human visual perception but also demonstrates greater robustness. As a result, our method achieves SOTA results on multiple existing benchmarks and achieves 18.2%~34.5% improvements on the AAD metric.

Related papers

Dual Dimensions Geometric Representation Learning Based Document Dewarping [17.529651556361355]
Document image dewarping remains a challenging task in the deep learning era.<n>We propose a fine-grained deformation perception model that focuses on Dual Dimensions of document horizontal-vertical-lines.<n>Our method achieves better rectification results compared with the state-of-the-art methods.
arXiv Detail & Related papers (2025-07-11T11:16:58Z)
Reading a Ruler in the Wild [1.4785540163232234]
Accurately converting pixel measurements into absolute real-world dimensions remains a fundamental challenge in computer vision.<n>We introduce RulerNet, a deep learning framework that robustly infers scale "in the wild"<n> Experiments show that RulerNet delivers accurate, consistent, and efficient scale estimates under challenging real-world conditions.
arXiv Detail & Related papers (2025-07-09T17:35:58Z)
CP$^2$: Leveraging Geometry for Conformal Prediction via Canonicalization [51.716834831684004]
We study the problem of conformal prediction (CP) under geometric data shifts.<n>We propose integrating geometric information--such as geometric pose--into the conformal procedure to reinstate its guarantees.
arXiv Detail & Related papers (2025-06-19T10:12:02Z)
PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation [70.98107766265636]
This paper takes the geometric attributes of pre-trained weights as a starting point, systematically analyzing three key components: magnitude, absolute angle, and pairwise angular structure.<n>We find that the pairwise angular structure remains stable across diverse corrupted domains and encodes domain-invariant semantic information, suggesting it should be preserved during adaptation.
arXiv Detail & Related papers (2025-06-03T05:18:15Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
Polar-Doc: One-Stage Document Dewarping with Multi-Scope Constraints under Polar Representation [26.050987382098107]
Document dewarping aims to eliminate geometric deformation in photographed documents to benefit text recognition. In this work, we explore Polar coordinates representation for each point in document dewarping, namely Polar-Doc. We propose a novel multi-scope Polar-Doc-IOU loss to constrain the relationship among control points as a grid-based regularization.
arXiv Detail & Related papers (2023-12-13T06:50:30Z)
Engineering the Neural Collapse Geometry of Supervised-Contrastive Loss [28.529476019629097]
Supervised-contrastive loss (SCL) is an alternative to cross-entropy (CE) for classification tasks. We propose methods to engineer the geometry of learnt feature embeddings by modifying the contrastive loss.
arXiv Detail & Related papers (2023-10-02T04:23:17Z)
Geometry-Contrastive Transformer for Generalized 3D Pose Transfer [95.56457218144983]
The intuition of this work is to perceive the geometric inconsistency between the given meshes with the powerful self-attention mechanism. We propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies. We present a latent isometric regularization module together with a novel semi-synthesized dataset for the cross-dataset 3D pose transfer task.
arXiv Detail & Related papers (2021-12-14T13:14:24Z)
GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning [54.291331971813364]
offline reinforcement learning approaches can be divided into proximal and uncertainty-aware methods. In this work, we demonstrate the benefit of combining the two in a latent variational model. Our proposed metrics measure both the quality of out of distribution samples as well as the discrepancy of examples in the data.
arXiv Detail & Related papers (2021-02-22T19:42:40Z)
Multistage Curvilinear Coordinate Transform Based Document Image Dewarping using a Novel Quality Estimator [11.342730352935913]
The present work demonstrates a fast and improved technique for dewarping nonlinearly warped document images. The images are first dewarped at the page-level by estimating optimum inverse projections using curvilinear homography. The quality of the process is then estimated by evaluating a set of metrics related to the characteristics of the text lines and rectilinear objects. If the quality is estimated to be unsatisfactory, the page-level dewarping process is repeated with finer approximations. This is followed by a line-level dewarping process that makes granular corrections to the warps in individual text-lines.
arXiv Detail & Related papers (2020-03-15T17:17:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.