Related papers: Dual Dimensions Geometric Representation Learning Based Document Dewarping

Dual Dimensions Geometric Representation Learning Based Document Dewarping

URL: http://arxiv.org/abs/2507.08492v2
Date: Wed, 16 Jul 2025 15:59:35 GMT
Title: Dual Dimensions Geometric Representation Learning Based Document Dewarping
Authors: Heng Li, Qingcai Chen, Xiangping Wu,
Abstract summary: Document image dewarping remains a challenging task in the deep learning era.<n>We propose a fine-grained deformation perception model that focuses on Dual Dimensions of document horizontal-vertical-lines.<n>Our method achieves better rectification results compared with the state-of-the-art methods.
Score: 17.529651556361355
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Document image dewarping remains a challenging task in the deep learning era. While existing methods have improved by leveraging text line awareness, they typically focus only on a single horizontal dimension. In this paper, we propose a fine-grained deformation perception model that focuses on Dual Dimensions of document horizontal-vertical-lines to improve document Dewarping called D2Dewarp. It can perceive distortion trends in different directions across document details. To combine the horizontal and vertical granularity features, an effective fusion module based on X and Y coordinate is designed to facilitate interaction and constraint between the two dimensions for feature complementarity. Due to the lack of annotated line features in current public dewarping datasets, we also propose an automatic fine-grained annotation method using public document texture images and an automatic rendering engine to build a new large-scale distortion training dataset. The code and dataset will be publicly released. On public Chinese and English benchmarks, both quantitative and qualitative results show that our method achieves better rectification results compared with the state-of-the-art methods. The dataset will be publicly available at https://github.com/xiaomore/DocDewarpHV

Related papers

Leveraging Contrastive Learning for a Similarity-Guided Tampered Document Data Generation Pipeline [6.066442015301665]
We propose a novel method for generating high-quality tampered document images.<n>We first train an auxiliary network to compare text crops, leveraging contrastive learning with a novel strategy for defining positive pairs and their corresponding negatives.<n>Using a carefully designed generation pipeline, we introduce a framework capable of producing diverse, high-quality tampered document images.
arXiv Detail & Related papers (2026-02-19T12:39:38Z)
BookNet: Book Image Rectification via Cross-Page Attention Network [61.60737484928661]
We introduce BookNet, the first end-to-end deep learning framework specifically designed for dual-page book image rectification.<n>BookNet adopts a dual-branch architecture with cross-page attention mechanisms, enabling it to estimate warping flows for both individual pages and the complete book spread.<n>To address the absence of specialized datasets, we present Book3D, a large-scale synthetic dataset for training, and Book100, a comprehensive real-world benchmark for evaluation.
arXiv Detail & Related papers (2026-01-29T16:26:25Z)
Axis-Aligned Document Dewarping [39.058312371271825]
We introduce a new metric, Axis-Aligned Distortion (AAD), that incorporates geometric meaning and aligns with human visual perception.<n>Our method achieves SOTA results on multiple existing benchmarks and achieves 18.2%34.5% improvements on the AAD metric.
arXiv Detail & Related papers (2025-07-20T15:12:57Z)
D2AF: A Dual-Driven Annotation and Filtering Framework for Visual Grounding [36.321156992727055]
D2AF is a robust annotation framework for visual grounding using only input images.<n>By implementing dual-driven annotation strategies, we effectively generate detailed region-text pairs.<n>Our findings demonstrate that increasing data volume enhances model performance.
arXiv Detail & Related papers (2025-05-30T09:04:47Z)
Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings. First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss. Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z)
GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation [14.511401955827875]
Object detection in documents is a key step to automate the structural elements identification process. We present a graph-based knowledge distillation framework to correctly identify and localize the document objects in a document image.
arXiv Detail & Related papers (2024-02-17T23:08:32Z)
EIGEN: Expert-Informed Joint Learning Aggregation for High-Fidelity Information Extraction from Document Images [27.36816896426097]
Information Extraction from document images is challenging due to the high variability of layout formats. We propose a novel approach, EIGEN, which combines rule-based methods with deep learning models using data programming approaches. We empirically show that our EIGEN framework can significantly improve the performance of state-of-the-art deep models with the availability of very few labeled data instances.
arXiv Detail & Related papers (2023-11-23T13:20:42Z)
Neural Semantic Surface Maps [52.61017226479506]
We present an automated technique for computing a map between two genus-zero shapes, which matches semantically corresponding regions to one another. Our approach can generate semantic surface-to-surface maps, eliminating manual annotations or any 3D training data requirement.
arXiv Detail & Related papers (2023-09-09T16:21:56Z)
Explicit Correspondence Matching for Generalizable Neural Radiance Fields [66.99907718824782]
We present a new NeRF method that is able to generalize to new unseen scenarios and perform novel view synthesis with as few as two source views.<n>The explicit correspondence matching is quantified with the cosine similarity between image features sampled at the 2D projections of a 3D point on different views.<n>Our method achieves state-of-the-art results on different evaluation settings, with the experiments showing a strong correlation between our learned cosine feature similarity and volume density.
arXiv Detail & Related papers (2023-04-24T17:46:01Z)
UVDoc: Neural Grid-based Document Unwarping [20.51368640747448]
Restoring the original, flat appearance of a printed document from casual photographs is a common everyday problem. We propose a novel method for grid-based single-image document unwarping. Our method performs geometric distortion correction via a fully convolutional deep neural network.
arXiv Detail & Related papers (2023-02-06T15:53:34Z)
Learning Object-Language Alignments for Open-Vocabulary Object Detection [83.09560814244524]
We propose a novel open-vocabulary object detection framework directly learning from image-text pair data. It enables us to train an open-vocabulary object detector on image-text pairs in a much simple and effective way.
arXiv Detail & Related papers (2022-11-27T14:47:31Z)
Geometric Representation Learning for Document Image Rectification [137.75133384124976]
We present DocGeoNet for document image rectification by introducing explicit geometric representation. Our motivation arises from the insight that 3D shape provides global unwarping cues for rectifying a distorted document image. Experiments show the effectiveness of our framework and demonstrate the superiority of our framework over state-of-the-art methods.
arXiv Detail & Related papers (2022-10-15T01:57:40Z)
Boosting Modern and Historical Handwritten Text Recognition with Deformable Convolutions [52.250269529057014]
Handwritten Text Recognition (HTR) in free-volution pages is a challenging image understanding task. We propose to adopt deformable convolutions, which can deform depending on the input at hand and better adapt to the geometric variations of the text.
arXiv Detail & Related papers (2022-08-17T06:55:54Z)
Multistage Curvilinear Coordinate Transform Based Document Image Dewarping using a Novel Quality Estimator [11.342730352935913]
The present work demonstrates a fast and improved technique for dewarping nonlinearly warped document images. The images are first dewarped at the page-level by estimating optimum inverse projections using curvilinear homography. The quality of the process is then estimated by evaluating a set of metrics related to the characteristics of the text lines and rectilinear objects. If the quality is estimated to be unsatisfactory, the page-level dewarping process is repeated with finer approximations. This is followed by a line-level dewarping process that makes granular corrections to the warps in individual text-lines.
arXiv Detail & Related papers (2020-03-15T17:17:53Z)
Learning to Select Bi-Aspect Information for Document-Scale Text Content Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer. In detail, the input is a set of structured records and a reference text for describing another recordset. The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.