Light-weight Document Image Cleanup using Perceptual Loss
- URL: http://arxiv.org/abs/2105.09076v1
- Date: Wed, 19 May 2021 11:54:28 GMT
- Title: Light-weight Document Image Cleanup using Perceptual Loss
- Authors: Soumyadeep Dey, Pratik Jawanpuria
- Abstract summary: We propose a light-weight encoder based convolutional neural network architecture for removing the noisy elements from document images.
In terms of the number of parameters and product-sum operations, our models are 65-1030 and 3-27 times, respectively, smaller than existing document enhancement models.
- Score: 7.106986689736828
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Smartphones have enabled effortless capturing and sharing of documents in
digital form. The documents, however, often undergo various types of
degradation due to aging, stains, or shortcoming of capturing environment such
as shadow, non-uniform lighting, etc., which reduces the comprehensibility of
the document images. In this work, we consider the problem of document image
cleanup on embedded applications such as smartphone apps, which usually have
memory, energy, and latency limitations due to the device and/or for best human
user experience. We propose a light-weight encoder decoder based convolutional
neural network architecture for removing the noisy elements from document
images. To compensate for generalization performance with a low network
capacity, we incorporate the perceptual loss for knowledge transfer from
pre-trained deep CNN network in our loss function. In terms of the number of
parameters and product-sum operations, our models are 65-1030 and 3-27 times,
respectively, smaller than existing state-of-the-art document enhancement
models. Overall, the proposed models offer a favorable resource versus accuracy
trade-off and we empirically illustrate the efficacy of our approach on several
real-world benchmark datasets.
Related papers
- Prompt-based Ingredient-Oriented All-in-One Image Restoration [0.0]
We propose a novel data ingredient-oriented approach to tackle multiple image degradation tasks.
Specifically, we utilize a encoder to capture features and introduce prompts with degradation-specific information to guide the decoder.
Our method performs competitively to the state-of-the-art.
arXiv Detail & Related papers (2023-09-06T15:05:04Z) - High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net [42.32958776152137]
Shadows often occur when we capture the documents with casual equipment.
Different from the algorithms for natural shadow removal, the algorithms in document shadow removal need to preserve the details of fonts and figures in high-resolution input.
We handle high-resolution document shadow removal directly via a larger-scale real-world dataset and a carefully designed frequency-aware network.
arXiv Detail & Related papers (2023-08-27T22:45:24Z) - DocDiff: Document Enhancement via Residual Diffusion Models [7.972081359533047]
We propose DocDiff, a diffusion-based framework specifically designed for document enhancement problems.
DocDiff consists of two modules: the Coarse Predictor (CP) and the High-Frequency Residual Refinement (HRR) module.
Our proposed HRR module in pre-trained DocDiff is plug-and-play and ready-to-use, with only 4.17M parameters.
arXiv Detail & Related papers (2023-05-06T01:41:10Z) - Deep Unrestricted Document Image Rectification [110.61517455253308]
We present DocTr++, a novel unified framework for document image rectification.
We upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing.
We contribute a real-world test set and metrics applicable for evaluating the rectification quality.
arXiv Detail & Related papers (2023-04-18T08:00:54Z) - ShaDocNet: Learning Spatial-Aware Tokens in Transformer for Document
Shadow Removal [53.01990632289937]
We propose a Transformer-based model for document shadow removal.
It uses shadow context encoding and decoding in both shadow and shadow-free regions.
arXiv Detail & Related papers (2022-11-30T01:46:29Z) - Perceptual Image Enhancement for Smartphone Real-Time Applications [60.45737626529091]
We propose LPIENet, a lightweight network for perceptual image enhancement.
Our model can deal with noise artifacts, diffraction artifacts, blur, and HDR overexposure.
Our model can process 2K resolution images under 1 second in mid-level commercial smartphones.
arXiv Detail & Related papers (2022-10-24T19:16:33Z) - EraseNet: A Recurrent Residual Network for Supervised Document Cleaning [0.0]
This paper introduces a supervised approach for cleaning dirty documents using a new fully convolutional auto-encoder architecture.
The experiments in this paper have shown promising results as the model is able to learn a variety of ordinary as well as unusual noises and rectify them efficiently.
arXiv Detail & Related papers (2022-10-03T04:23:25Z) - Fourier Document Restoration for Robust Document Dewarping and
Recognition [73.44057202891011]
This paper presents FDRNet, a Fourier Document Restoration Network that can restore documents with different distortions.
It dewarps documents by a flexible Thin-Plate Spline transformation which can handle various deformations effectively without requiring deformation annotations in training.
It outperforms the state-of-the-art by large margins on both dewarping and text recognition tasks.
arXiv Detail & Related papers (2022-03-18T12:39:31Z) - DocScanner: Robust Document Image Rectification with Progressive
Learning [162.03694280524084]
This work presents DocScanner, a new deep network architecture for document image rectification.
DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture.
The iterative refinements make DocScanner converge to a robust and superior performance, and the lightweight recurrent architecture ensures the running efficiency.
arXiv Detail & Related papers (2021-10-28T09:15:02Z) - Semantic-Guided Zero-Shot Learning for Low-Light Image/Video Enhancement [3.4722706398428493]
Low-light images challenge both human perceptions and computer vision algorithms.
It is crucial to make algorithms robust to enlighten low-light images for computational photography and computer vision applications.
This paper proposes a semantic-guided zero-shot low-light enhancement network which is trained in the absence of paired images.
arXiv Detail & Related papers (2021-10-03T10:07:36Z) - Attention Based Real Image Restoration [48.933507352496726]
Deep convolutional neural networks perform better on images containing synthetic degradations.
This paper proposes a novel single-stage blind real image restoration network (R$2$Net)
arXiv Detail & Related papers (2020-04-26T04:21:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.