Intrinsic Decomposition of Document Images In-the-Wild
- URL: http://arxiv.org/abs/2011.14447v1
- Date: Sun, 29 Nov 2020 21:39:58 GMT
- Title: Intrinsic Decomposition of Document Images In-the-Wild
- Authors: Sagnik Das, Hassan Ahmed Sial, Ke Ma, Ramon Baldrich, Maria Vanrell,
Dimitris Samaras
- Abstract summary: We present a learning-based method that directly estimates document reflectance based on intrinsic image formation.
The proposed architecture works in a self-supervised manner where only the synthetic texture is used as a weak training signal.
Our reflectance estimation scheme, when used as a pre-processing step of an OCR pipeline, shows a 26% improvement of character error rate.
- Score: 28.677728405031782
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Automatic document content processing is affected by artifacts caused by the
shape of the paper, non-uniform and diverse color of lighting conditions.
Fully-supervised methods on real data are impossible due to the large amount of
data needed. Hence, the current state of the art deep learning models are
trained on fully or partially synthetic images. However, document shadow or
shading removal results still suffer because: (a) prior methods rely on
uniformity of local color statistics, which limit their application on
real-scenarios with complex document shapes and textures and; (b) synthetic or
hybrid datasets with non-realistic, simulated lighting conditions are used to
train the models. In this paper we tackle these problems with our two main
contributions. First, a physically constrained learning-based method that
directly estimates document reflectance based on intrinsic image formation
which generalizes to challenging illumination conditions. Second, a new dataset
that clearly improves previous synthetic ones, by adding a large range of
realistic shading and diverse multi-illuminant conditions, uniquely customized
to deal with documents in-the-wild. The proposed architecture works in a
self-supervised manner where only the synthetic texture is used as a weak
training signal (obviating the need for very costly ground truth with
disentangled versions of shading and reflectance). The proposed approach leads
to a significant generalization of document reflectance estimation in real
scenes with challenging illumination. We extensively evaluate on the real
benchmark datasets available for intrinsic image decomposition and document
shadow removal tasks. Our reflectance estimation scheme, when used as a
pre-processing step of an OCR pipeline, shows a 26% improvement of character
error rate (CER), thus, proving the practical applicability.
Related papers
- IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination [37.96484120807323]
This paper aims to recover object materials from posed images captured under an unknown static lighting condition.
We learn the material prior with a generative model for regularizing the optimization process.
Experiments on real-world and synthetic datasets demonstrate that our approach achieves state-of-the-art performance on material recovery.
arXiv Detail & Related papers (2024-04-17T17:45:08Z) - Face Inverse Rendering via Hierarchical Decoupling [19.530753479268384]
Previous face inverse rendering methods often require synthetic data with ground truth and/or professional equipment like a lighting stage.
We propose a deep learning framework to disentangle face images in the wild into their corresponding albedo, normal, and lighting components.
arXiv Detail & Related papers (2023-01-17T07:24:47Z) - TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose
Estimation [55.94900327396771]
We introduce neural texture learning for 6D object pose estimation from synthetic data.
We learn to predict realistic texture of objects from real image collections.
We learn pose estimation from pixel-perfect synthetic data.
arXiv Detail & Related papers (2022-12-25T13:36:32Z) - UDoc-GAN: Unpaired Document Illumination Correction with Background
Light Prior [128.19212716007794]
UDoc-GAN is first framework to address the problem of document illumination correction under the unpaired setting.
We first predict the ambient light features of the document.
Then, according to the characteristics of different level of ambient lights, we re-formulate the cycle consistency constraint.
Compared with the state-of-the-art approaches, our method demonstrates promising performance in terms of character error rate (CER) and edit distance (ED)
arXiv Detail & Related papers (2022-10-15T07:19:23Z) - Neural Radiance Transfer Fields for Relightable Novel-view Synthesis
with Global Illumination [63.992213016011235]
We propose a method for scene relighting under novel views by learning a neural precomputed radiance transfer function.
Our method can be solely supervised on a set of real images of the scene under a single unknown lighting condition.
Results show that the recovered disentanglement of scene parameters improves significantly over the current state of the art.
arXiv Detail & Related papers (2022-07-27T16:07:48Z) - Designing An Illumination-Aware Network for Deep Image Relighting [69.750906769976]
We present an Illumination-Aware Network (IAN) which follows the guidance from hierarchical sampling to progressively relight a scene from a single image.
In addition, an Illumination-Aware Residual Block (IARB) is designed to approximate the physical rendering process.
Experimental results show that our proposed method produces better quantitative and qualitative relighting results than previous state-of-the-art methods.
arXiv Detail & Related papers (2022-07-21T16:21:24Z) - DIB-R++: Learning to Predict Lighting and Material with a Hybrid
Differentiable Renderer [78.91753256634453]
We consider the challenging problem of predicting intrinsic object properties from a single image by exploiting differentiables.
In this work, we propose DIBR++, a hybrid differentiable which supports these effects by combining specularization and ray-tracing.
Compared to more advanced physics-based differentiables, DIBR++ is highly performant due to its compact and expressive model.
arXiv Detail & Related papers (2021-10-30T01:59:39Z) - Optical Flow Dataset Synthesis from Unpaired Images [36.158607790844705]
We introduce a novel method to build a training set of pseudo-real images that can be used to train optical flow in a supervised manner.
Our dataset uses two unpaired frames from real data and creates pairs of frames by simulating random warps.
We thus obtain the benefit of directly training on real data while having access to an exact ground truth.
arXiv Detail & Related papers (2021-04-02T22:19:47Z) - Deep CG2Real: Synthetic-to-Real Translation via Image Disentanglement [78.58603635621591]
Training an unpaired synthetic-to-real translation network in image space is severely under-constrained.
We propose a semi-supervised approach that operates on the disentangled shading and albedo layers of the image.
Our two-stage pipeline first learns to predict accurate shading in a supervised fashion using physically-based renderings as targets.
arXiv Detail & Related papers (2020-03-27T21:45:41Z) - Adversarial Texture Optimization from RGB-D Scans [37.78810126921875]
We present a novel approach for color texture generation using a conditional adversarial loss obtained from weakly-supervised views.
The key idea of our approach is to learn a patch-based conditional discriminator which guides the texture optimization to be tolerant to misalignments.
arXiv Detail & Related papers (2020-03-18T18:00:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.