UVDoc: Neural Grid-based Document Unwarping
- URL: http://arxiv.org/abs/2302.02887v2
- Date: Tue, 27 Feb 2024 15:59:39 GMT
- Title: UVDoc: Neural Grid-based Document Unwarping
- Authors: Floor Verhoeven, Tanguy Magne, Olga Sorkine-Hornung
- Abstract summary: Restoring the original, flat appearance of a printed document from casual photographs is a common everyday problem.
We propose a novel method for grid-based single-image document unwarping.
Our method performs geometric distortion correction via a fully convolutional deep neural network.
- Score: 20.51368640747448
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Restoring the original, flat appearance of a printed document from casual
photographs of bent and wrinkled pages is a common everyday problem. In this
paper we propose a novel method for grid-based single-image document unwarping.
Our method performs geometric distortion correction via a fully convolutional
deep neural network that learns to predict the 3D grid mesh of the document and
the corresponding 2D unwarping grid in a dual-task fashion, implicitly encoding
the coupling between the shape of a 3D piece of paper and its 2D image. In
order to allow unwarping models to train on data that is more realistic in
appearance than the commonly used synthetic Doc3D dataset, we create and
publish our own dataset, called UVDoc, which combines pseudo-photorealistic
document images with physically accurate 3D shape and unwarping function
annotations. Our dataset is labeled with all the information necessary to train
our unwarping network, without having to engineer separate loss functions that
can deal with the lack of ground-truth typically found in document in the wild
datasets. We perform an in-depth evaluation that demonstrates that with the
inclusion of our novel pseudo-photorealistic dataset, our relatively small
network architecture achieves state-of-the-art results on the DocUNet
benchmark. We show that the pseudo-photorealistic nature of our UVDoc dataset
allows for new and better evaluation methods, such as lighting-corrected
MS-SSIM. We provide a novel benchmark dataset that facilitates such
evaluations, and propose a metric that quantifies line straightness after
unwarping. Our code, results and UVDoc dataset are available at
https://github.com/tanguymagne/UVDoc.
Related papers
- FAMOUS: High-Fidelity Monocular 3D Human Digitization Using View Synthesis [51.193297565630886]
The challenge of accurately inferring texture remains, particularly in obscured areas such as the back of a person in frontal-view images.
This limitation in texture prediction largely stems from the scarcity of large-scale and diverse 3D datasets.
We propose leveraging extensive 2D fashion datasets to enhance both texture and shape prediction in 3D human digitization.
arXiv Detail & Related papers (2024-10-13T01:25:05Z) - ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models [65.22994156658918]
We present a method that learns to generate multi-view images in a single denoising process from real-world data.
We design an autoregressive generation that renders more 3D-consistent images at any viewpoint.
arXiv Detail & Related papers (2024-03-04T07:57:05Z) - DocGraphLM: Documental Graph Language Model for Information Extraction [15.649726614383388]
We introduce DocGraphLM, a framework that combines pre-trained language models with graph semantics.
To achieve this, we propose 1) a joint encoder architecture to represent documents, and 2) a novel link prediction approach to reconstruct document graphs.
Our experiments on three SotA datasets show consistent improvement on IE and QA tasks with the adoption of graph features.
arXiv Detail & Related papers (2024-01-05T14:15:36Z) - Geometric Representation Learning for Document Image Rectification [137.75133384124976]
We present DocGeoNet for document image rectification by introducing explicit geometric representation.
Our motivation arises from the insight that 3D shape provides global unwarping cues for rectifying a distorted document image.
Experiments show the effectiveness of our framework and demonstrate the superiority of our framework over state-of-the-art methods.
arXiv Detail & Related papers (2022-10-15T01:57:40Z) - NeuralReshaper: Single-image Human-body Retouching with Deep Neural
Networks [50.40798258968408]
We present NeuralReshaper, a novel method for semantic reshaping of human bodies in single images using deep generative networks.
Our approach follows a fit-then-reshape pipeline, which first fits a parametric 3D human model to a source human image.
To deal with the lack-of-data problem that no paired data exist, we introduce a novel self-supervised strategy to train our network.
arXiv Detail & Related papers (2022-03-20T09:02:13Z) - Sim2Real Docs: Domain Randomization for Documents in Natural Scenes
using Ray-traced Rendering [2.8034191857296933]
Sim2Real Docs is a framework for datasets and performing domain randomization of documents in natural scenes.
By using rendering that simulates physical interactions of light, geometry, camera, and background, we synthesize datasets of documents in a natural scene context.
The role of machine learning models is then to solve the inverse problem posed by the rendering pipeline.
arXiv Detail & Related papers (2021-12-16T22:07:48Z) - From Synthetic to Real: Image Dehazing Collaborating with Unlabeled Real
Data [58.50411487497146]
We propose a novel image dehazing framework collaborating with unlabeled real data.
First, we develop a disentangled image dehazing network (DID-Net), which disentangles the feature representations into three component maps.
Then a disentangled-consistency mean-teacher network (DMT-Net) is employed to collaborate unlabeled real data for boosting single image dehazing.
arXiv Detail & Related papers (2021-08-06T04:00:28Z) - RectiNet-v2: A stacked network architecture for document image dewarping [16.249023269158734]
We propose an end-to-end CNN architecture that can produce distortion free document images from warped documents it takes as input.
We train this model on warped document images simulated synthetically to compensate for lack of enough natural data.
We evaluate our method on the DocUNet dataset, a benchmark in this domain, and obtain results comparable to state-of-the-art methods.
arXiv Detail & Related papers (2021-02-01T19:26:17Z) - Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image
Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties.
Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.