Related papers: DvD: Unleashing a Generative Paradigm for Document Dewarping via Coordinates-based Diffusion Model

DvD: Unleashing a Generative Paradigm for Document Dewarping via Coordinates-based Diffusion Model

URL: http://arxiv.org/abs/2505.21975v1
Date: Wed, 28 May 2025 05:05:51 GMT
Title: DvD: Unleashing a Generative Paradigm for Document Dewarping via Coordinates-based Diffusion Model
Authors: Weiguang Zhang, Huangcheng Lu, Maizhen Ning, Xiaowei Huang, Wei Wang, Kaizhu Huang, Qiufeng Wang,
Abstract summary: Document dewarping aims to rectify deformations in photographic document images, thus improving text readability.<n>We propose DvD, the first generative model to tackle document textbfDewarping textbfvia a textbfDiffusion framework.
Score: 25.504170988714783
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Document dewarping aims to rectify deformations in photographic document images, thus improving text readability, which has attracted much attention and made great progress, but it is still challenging to preserve document structures. Given recent advances in diffusion models, it is natural for us to consider their potential applicability to document dewarping. However, it is far from straightforward to adopt diffusion models in document dewarping due to their unfaithful control on highly complex document images (e.g., 2000$\times$3000 resolution). In this paper, we propose DvD, the first generative model to tackle document \textbf{D}ewarping \textbf{v}ia a \textbf{D}iffusion framework. To be specific, DvD introduces a coordinate-level denoising instead of typical pixel-level denoising, generating a mapping for deformation rectification. In addition, we further propose a time-variant condition refinement mechanism to enhance the preservation of document structures. In experiments, we find that current document dewarping benchmarks can not evaluate dewarping models comprehensively. To this end, we present AnyPhotoDoc6300, a rigorously designed large-scale document dewarping benchmark comprising 6,300 real image pairs across three distinct domains, enabling fine-grained evaluation of dewarping models. Comprehensive experiments demonstrate that our proposed DvD can achieve state-of-the-art performance with acceptable computational efficiency on multiple metrics across various benchmarks including DocUNet, DIR300, and AnyPhotoDoc6300. The new benchmark and code will be publicly available.

Related papers

DocVCE: Diffusion-based Visual Counterfactual Explanations for Document Image Classification [5.247930659596986]
We introduce generative document counterfactuals that provide meaningful insights into the model's decision-making through actionable explanations.<n>To the best of the authors' knowledge, this is the first work to explore generative counterfactual explanations in document image analysis.
arXiv Detail & Related papers (2025-08-06T09:15:32Z)
ADCD-Net: Robust Document Image Forgery Localization via Adaptive DCT Feature and Hierarchical Content Disentanglement [18.283496080974924]
ADCD-Net is a robust document forgery localization model that adaptively leverages the RGB/DCT forensic traces.<n>Our proposed ADCD-Net demonstrates superior forgery localization performance, consistently outperforming state-of-the-art methods by 20.79% averaged over 5 types of distortions.
arXiv Detail & Related papers (2025-07-22T09:48:23Z)
DREAM: Document Reconstruction via End-to-end Autoregressive Model [53.51754520966657]
We present an innovative autoregressive model specifically designed for document reconstruction, referred to as Document Reconstruction via End-to-end Autoregressive Model (DREAM)<n>We establish a standardized definition of the document reconstruction task, and introduce a novel Document Similarity Metric (DSM) and DocRec1K dataset for assessing the performance of the task.
arXiv Detail & Related papers (2025-07-08T09:24:07Z)
DECDM: Document Enhancement using Cycle-Consistent Diffusion Models [3.3813766129849845]
We propose DECDM, an end-to-end document-level image translation method inspired by recent advances in diffusion models. Our method overcomes the limitations of paired training by independently training the source (noisy input) and target (clean output) models. We also introduce simple data augmentation strategies to improve character-glyph conservation during translation.
arXiv Detail & Related papers (2023-11-16T07:16:02Z)
DocDiff: Document Enhancement via Residual Diffusion Models [7.972081359533047]
We propose DocDiff, a diffusion-based framework specifically designed for document enhancement problems. DocDiff consists of two modules: the Coarse Predictor (CP) and the High-Frequency Residual Refinement (HRR) module. Our proposed HRR module in pre-trained DocDiff is plug-and-play and ready-to-use, with only 4.17M parameters.
arXiv Detail & Related papers (2023-05-06T01:41:10Z)
Deep Unrestricted Document Image Rectification [110.61517455253308]
We present DocTr++, a novel unified framework for document image rectification. We upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing. We contribute a real-world test set and metrics applicable for evaluating the rectification quality.
arXiv Detail & Related papers (2023-04-18T08:00:54Z)
Geometric Rectification of Creased Document Images based on Isometric Mapping [0.0]
Geometric rectification of images of distorted documents finds wide applications in document digitization and Optical Character Recognition (OCR) We propose a general framework of document image rectification in which a computational isometric mapping model is utilized for expressing a 3D document model and its flattening in the plane. Experiments and comparisons to the state-of-the-art approaches demonstrated the effectiveness and outstanding performance of the proposed method.
arXiv Detail & Related papers (2022-12-16T09:33:31Z)
GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion. The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)
Fourier Document Restoration for Robust Document Dewarping and Recognition [73.44057202891011]
This paper presents FDRNet, a Fourier Document Restoration Network that can restore documents with different distortions. It dewarps documents by a flexible Thin-Plate Spline transformation which can handle various deformations effectively without requiring deformation annotations in training. It outperforms the state-of-the-art by large margins on both dewarping and text recognition tasks.
arXiv Detail & Related papers (2022-03-18T12:39:31Z)
Cross-Domain Document Layout Analysis Using Document Style Guide [15.799572801059716]
Document layout analysis (DLA) aims to decompose document images into high-level semantic areas. Many researchers devoted this challenge by synthesizing data to build large training sets. In this paper, we propose an unsupervised cross-domain DLA framework based on document style guidance.
arXiv Detail & Related papers (2022-01-24T00:49:19Z)
DocScanner: Robust Document Image Rectification with Progressive Learning [162.03694280524084]
This work presents DocScanner, a new deep network architecture for document image rectification. DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture. The iterative refinements make DocScanner converge to a robust and superior performance, and the lightweight recurrent architecture ensures the running efficiency.
arXiv Detail & Related papers (2021-10-28T09:15:02Z)
Self-supervised Deep Reconstruction of Mixed Strip-shredded Text Documents [63.41717168981103]
This work extends our previous deep learning method for single-page reconstruction to a more realistic/complex scenario. In our approach, the compatibility evaluation is modeled as a two-class (valid or invalid) pattern recognition problem. The proposed method outperforms the competing ones on complex scenarios, achieving accuracy superior to 90%.
arXiv Detail & Related papers (2020-07-01T21:48:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.