Revisiting Document Image Dewarping by Grid Regularization
- URL: http://arxiv.org/abs/2203.16850v1
- Date: Thu, 31 Mar 2022 07:18:30 GMT
- Title: Revisiting Document Image Dewarping by Grid Regularization
- Authors: Xiangwei Jiang, Rujiao Long, Nan Xue, Zhibo Yang, Cong Yao, Gui-Song
Xia
- Abstract summary: This paper addresses the problem of document image dewarping.
We take the text lines and the document boundaries into account from a constrained optimization perspective.
Our proposed method first learns the boundary points and the pixels in the text lines.
- Score: 41.87305384805975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the problem of document image dewarping, which aims at
eliminating the geometric distortion in document images for document
digitization. Instead of designing a better neural network to approximate the
optical flow fields between the inputs and outputs, we pursue the best
readability by taking the text lines and the document boundaries into account
from a constrained optimization perspective. Specifically, our proposed method
first learns the boundary points and the pixels in the text lines and then
follows the most simple observation that the boundaries and text lines in both
horizontal and vertical directions should be kept after dewarping to introduce
a novel grid regularization scheme. To obtain the final forward mapping for
dewarping, we solve an optimization problem with our proposed grid
regularization. The experiments comprehensively demonstrate that our proposed
approach outperforms the prior arts by large margins in terms of readability
(with the metrics of Character Errors Rate and the Edit Distance) while
maintaining the best image quality on the publicly-available DocUNet benchmark.
Related papers
- Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.
We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework.
We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z) - LoCo: Locally Constrained Training-Free Layout-to-Image Synthesis [24.925757148750684]
We propose a training-free approach for layout-to-image Synthesis that excels in producing high-quality images aligned with both textual prompts and layout instructions.
LoCo seamlessly integrates into existing text-to-image and layout-to-image models, enhancing their performance in spatial control and addressing semantic failures observed in prior methods.
arXiv Detail & Related papers (2023-11-21T04:28:12Z) - Deep Unrestricted Document Image Rectification [110.61517455253308]
We present DocTr++, a novel unified framework for document image rectification.
We upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing.
We contribute a real-world test set and metrics applicable for evaluating the rectification quality.
arXiv Detail & Related papers (2023-04-18T08:00:54Z) - Variational Distribution Learning for Unsupervised Text-to-Image
Generation [42.3246826401366]
We propose a text-to-image generation algorithm based on deep neural networks when text captions for images are unavailable during training.
We employ a pretrained CLIP model, which is capable of properly aligning embeddings of images and corresponding texts in a joint space.
We optimize a text-to-image generation model by maximizing the data log-likelihood conditioned on pairs of image-text CLIP embeddings.
arXiv Detail & Related papers (2023-03-28T16:18:56Z) - Dewarping Document Image By Displacement Flow Estimation with Fully
Convolutional Network [30.18238229156996]
We propose a framework for both rectifying distorted document image and removing background finely, using a fully convolutional network (FCN)
The FCN is trained by regressing displacements of synthesized distorted documents, and to control the smoothness of displacements, we propose a Local Smooth Constraint (LSC) in regularization.
Experiments proved that our approach can dewarp document images effectively under various geometric distortions, and has achieved the state-of-the-art performance in terms of local details and overall effect.
arXiv Detail & Related papers (2021-04-14T12:32:36Z) - Semantic Layout Manipulation with High-Resolution Sparse Attention [106.59650698907953]
We tackle the problem of semantic image layout manipulation, which aims to manipulate an input image by editing its semantic label map.
A core problem of this task is how to transfer visual details from the input images to the new semantic layout while making the resulting image visually realistic.
We propose a high-resolution sparse attention module that effectively transfers visual details to new layouts at a resolution up to 512x512.
arXiv Detail & Related papers (2020-12-14T06:50:43Z) - Weakly supervised cross-domain alignment with optimal transport [102.8572398001639]
Cross-domain alignment between image objects and text sequences is key to many visual-language tasks.
This paper investigates a novel approach for the identification and optimization of fine-grained semantic similarities between image and text entities.
arXiv Detail & Related papers (2020-08-14T22:48:36Z) - Can You Read Me Now? Content Aware Rectification using Angle Supervision [14.095728009592763]
We present CREASE: Content Aware Rectification using Angle Supervision, the first learned method for document rectification.
Our method surpasses previous approaches in terms of OCR accuracy, geometric error and visual similarity.
arXiv Detail & Related papers (2020-08-05T16:58:13Z) - Coarse-to-Fine Gaze Redirection with Numerical and Pictorial Guidance [74.27389895574422]
We propose a novel gaze redirection framework which exploits both a numerical and a pictorial direction guidance.
The proposed method outperforms the state-of-the-art approaches in terms of both image quality and redirection precision.
arXiv Detail & Related papers (2020-04-07T01:17:27Z) - Multistage Curvilinear Coordinate Transform Based Document Image
Dewarping using a Novel Quality Estimator [11.342730352935913]
The present work demonstrates a fast and improved technique for dewarping nonlinearly warped document images.
The images are first dewarped at the page-level by estimating optimum inverse projections using curvilinear homography.
The quality of the process is then estimated by evaluating a set of metrics related to the characteristics of the text lines and rectilinear objects.
If the quality is estimated to be unsatisfactory, the page-level dewarping process is repeated with finer approximations.
This is followed by a line-level dewarping process that makes granular corrections to the warps in individual text-lines.
arXiv Detail & Related papers (2020-03-15T17:17:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.