Axis-Aligned Document Dewarping
- URL: http://arxiv.org/abs/2507.15000v1
- Date: Sun, 20 Jul 2025 15:12:57 GMT
- Title: Axis-Aligned Document Dewarping
- Authors: Chaoyun Wang, I-Chao Shen, Takeo Igarashi, Nanning Zheng, Caigui Jiang,
- Abstract summary: We introduce a new metric, Axis-Aligned Distortion (AAD), that incorporates geometric meaning and aligns with human visual perception.<n>Our method achieves SOTA results on multiple existing benchmarks and achieves 18.2%34.5% improvements on the AAD metric.
- Score: 39.058312371271825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Document dewarping is crucial for many applications. However, existing learning-based methods primarily rely on supervised regression with annotated data without leveraging the inherent geometric properties in physical documents to the dewarping process. Our key insight is that a well-dewarped document is characterized by transforming distorted feature lines into axis-aligned ones. This property aligns with the inherent axis-aligned nature of the discrete grid geometry in planar documents. In the training phase, we propose an axis-aligned geometric constraint to enhance document dewarping. In the inference phase, we propose an axis alignment preprocessing strategy to reduce the dewarping difficulty. In the evaluation phase, we introduce a new metric, Axis-Aligned Distortion (AAD), that not only incorporates geometric meaning and aligns with human visual perception but also demonstrates greater robustness. As a result, our method achieves SOTA results on multiple existing benchmarks and achieves 18.2%~34.5% improvements on the AAD metric.
Related papers
- Dual Dimensions Geometric Representation Learning Based Document Dewarping [17.529651556361355]
Document image dewarping remains a challenging task in the deep learning era.<n>We propose a fine-grained deformation perception model that focuses on Dual Dimensions of document horizontal-vertical-lines.<n>Our method achieves better rectification results compared with the state-of-the-art methods.
arXiv Detail & Related papers (2025-07-11T11:16:58Z) - Reading a Ruler in the Wild [1.4785540163232234]
Accurately converting pixel measurements into absolute real-world dimensions remains a fundamental challenge in computer vision.<n>We introduce RulerNet, a deep learning framework that robustly infers scale "in the wild"<n> Experiments show that RulerNet delivers accurate, consistent, and efficient scale estimates under challenging real-world conditions.
arXiv Detail & Related papers (2025-07-09T17:35:58Z) - CP$^2$: Leveraging Geometry for Conformal Prediction via Canonicalization [51.716834831684004]
We study the problem of conformal prediction (CP) under geometric data shifts.<n>We propose integrating geometric information--such as geometric pose--into the conformal procedure to reinstate its guarantees.
arXiv Detail & Related papers (2025-06-19T10:12:02Z) - PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation [70.98107766265636]
This paper takes the geometric attributes of pre-trained weights as a starting point, systematically analyzing three key components: magnitude, absolute angle, and pairwise angular structure.<n>We find that the pairwise angular structure remains stable across diverse corrupted domains and encodes domain-invariant semantic information, suggesting it should be preserved during adaptation.
arXiv Detail & Related papers (2025-06-03T05:18:15Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - Polar-Doc: One-Stage Document Dewarping with Multi-Scope Constraints
under Polar Representation [26.050987382098107]
Document dewarping aims to eliminate geometric deformation in photographed documents to benefit text recognition.
In this work, we explore Polar coordinates representation for each point in document dewarping, namely Polar-Doc.
We propose a novel multi-scope Polar-Doc-IOU loss to constrain the relationship among control points as a grid-based regularization.
arXiv Detail & Related papers (2023-12-13T06:50:30Z) - Engineering the Neural Collapse Geometry of Supervised-Contrastive Loss [28.529476019629097]
Supervised-contrastive loss (SCL) is an alternative to cross-entropy (CE) for classification tasks.
We propose methods to engineer the geometry of learnt feature embeddings by modifying the contrastive loss.
arXiv Detail & Related papers (2023-10-02T04:23:17Z) - Geometry-Contrastive Transformer for Generalized 3D Pose Transfer [95.56457218144983]
The intuition of this work is to perceive the geometric inconsistency between the given meshes with the powerful self-attention mechanism.
We propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies.
We present a latent isometric regularization module together with a novel semi-synthesized dataset for the cross-dataset 3D pose transfer task.
arXiv Detail & Related papers (2021-12-14T13:14:24Z) - GELATO: Geometrically Enriched Latent Model for Offline Reinforcement
Learning [54.291331971813364]
offline reinforcement learning approaches can be divided into proximal and uncertainty-aware methods.
In this work, we demonstrate the benefit of combining the two in a latent variational model.
Our proposed metrics measure both the quality of out of distribution samples as well as the discrepancy of examples in the data.
arXiv Detail & Related papers (2021-02-22T19:42:40Z) - Multistage Curvilinear Coordinate Transform Based Document Image
Dewarping using a Novel Quality Estimator [11.342730352935913]
The present work demonstrates a fast and improved technique for dewarping nonlinearly warped document images.
The images are first dewarped at the page-level by estimating optimum inverse projections using curvilinear homography.
The quality of the process is then estimated by evaluating a set of metrics related to the characteristics of the text lines and rectilinear objects.
If the quality is estimated to be unsatisfactory, the page-level dewarping process is repeated with finer approximations.
This is followed by a line-level dewarping process that makes granular corrections to the warps in individual text-lines.
arXiv Detail & Related papers (2020-03-15T17:17:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.