DocDiff: Document Enhancement via Residual Diffusion Models
- URL: http://arxiv.org/abs/2305.03892v2
- Date: Wed, 9 Aug 2023 07:37:25 GMT
- Title: DocDiff: Document Enhancement via Residual Diffusion Models
- Authors: Zongyuan Yang, Baolin Liu, Yongping Xiong, Lan Yi, Guibin Wu, Xiaojun
Tang, Ziqi Liu, Junjie Zhou, Xing Zhang
- Abstract summary: We propose DocDiff, a diffusion-based framework specifically designed for document enhancement problems.
DocDiff consists of two modules: the Coarse Predictor (CP) and the High-Frequency Residual Refinement (HRR) module.
Our proposed HRR module in pre-trained DocDiff is plug-and-play and ready-to-use, with only 4.17M parameters.
- Score: 7.972081359533047
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Removing degradation from document images not only improves their visual
quality and readability, but also enhances the performance of numerous
automated document analysis and recognition tasks. However, existing
regression-based methods optimized for pixel-level distortion reduction tend to
suffer from significant loss of high-frequency information, leading to
distorted and blurred text edges. To compensate for this major deficiency, we
propose DocDiff, the first diffusion-based framework specifically designed for
diverse challenging document enhancement problems, including document
deblurring, denoising, and removal of watermarks and seals. DocDiff consists of
two modules: the Coarse Predictor (CP), which is responsible for recovering the
primary low-frequency content, and the High-Frequency Residual Refinement (HRR)
module, which adopts the diffusion models to predict the residual
(high-frequency information, including text edges), between the ground-truth
and the CP-predicted image. DocDiff is a compact and computationally efficient
model that benefits from a well-designed network architecture, an optimized
training loss objective, and a deterministic sampling process with short time
steps. Extensive experiments demonstrate that DocDiff achieves state-of-the-art
(SOTA) performance on multiple benchmark datasets, and can significantly
enhance the readability and recognizability of degraded document images.
Furthermore, our proposed HRR module in pre-trained DocDiff is plug-and-play
and ready-to-use, with only 4.17M parameters. It greatly sharpens the text
edges generated by SOTA deblurring methods without additional joint training.
Available codes: https://github.com/Royalvice/DocDiff
Related papers
- Efficient Diffusion as Low Light Enhancer [63.789138528062225]
Reflectance-Aware Trajectory Refinement (RATR) is a simple yet effective module to refine the teacher trajectory using the reflectance component of images.
textbfReflectance-aware textbfDiffusion with textbfDistilled textbfTrajectory (textbfReDDiT) is an efficient and flexible distillation framework tailored for Low-Light Image Enhancement (LLIE)
arXiv Detail & Related papers (2024-10-16T08:07:18Z) - NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement [4.841365627573421]
A crucial preprocessing step is essential to eliminate noise while preserving text and key features of documents.
We propose NAF-DPM, a novel generative framework based on a diffusion probabilistic model (DPM) designed to restore the original quality of degraded documents.
arXiv Detail & Related papers (2024-04-08T16:52:21Z) - BlindDiff: Empowering Degradation Modelling in Diffusion Models for Blind Image Super-Resolution [52.47005445345593]
BlindDiff is a DM-based blind SR method to tackle the blind degradation settings in SISR.
BlindDiff seamlessly integrates the MAP-based optimization into DMs.
Experiments on both synthetic and real-world datasets show that BlindDiff achieves the state-of-the-art performance.
arXiv Detail & Related papers (2024-03-15T11:21:34Z) - Lightweight Adaptive Feature De-drifting for Compressed Image
Classification [10.265991649449507]
High-level vision models trained on high-quality images will suffer performance degradation when dealing with compressed images.
Various learning-based JPEG artifact removal methods have been proposed to handle visual artifacts.
This paper proposes a novel lightweight AFD module to boost the performance of pre-trained image classification models when facing compressed images.
arXiv Detail & Related papers (2024-01-03T13:03:44Z) - DGNet: Dynamic Gradient-Guided Network for Water-Related Optics Image
Enhancement [77.0360085530701]
Underwater image enhancement (UIE) is a challenging task due to the complex degradation caused by underwater environments.
Previous methods often idealize the degradation process, and neglect the impact of medium noise and object motion on the distribution of image features.
Our approach utilizes predicted images to dynamically update pseudo-labels, adding a dynamic gradient to optimize the network's gradient space.
arXiv Detail & Related papers (2023-12-12T06:07:21Z) - DECDM: Document Enhancement using Cycle-Consistent Diffusion Models [3.3813766129849845]
We propose DECDM, an end-to-end document-level image translation method inspired by recent advances in diffusion models.
Our method overcomes the limitations of paired training by independently training the source (noisy input) and target (clean output) models.
We also introduce simple data augmentation strategies to improve character-glyph conservation during translation.
arXiv Detail & Related papers (2023-11-16T07:16:02Z) - Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network [52.77569396659629]
This paper presents the deep compensation network unfolding (DCUNet) for restoring light field (LF) images captured under low-light conditions.
The framework uses the intermediate enhanced result to estimate the illumination map, which is then employed in the unfolding process to produce a new enhanced result.
To properly leverage the unique characteristics of LF images, this paper proposes a pseudo-explicit feature interaction module.
arXiv Detail & Related papers (2023-08-10T07:53:06Z) - Transformer-Based UNet with Multi-Headed Cross-Attention Skip
Connections to Eliminate Artifacts in Scanned Documents [0.0]
A modified UNet structure using a Swin Transformer backbone is presented to remove typical artifacts in scanned documents.
An improvement in text extraction quality with a reduced error rate of up to 53.9% on the synthetic data is archived.
arXiv Detail & Related papers (2023-06-05T12:12:23Z) - Deep Unrestricted Document Image Rectification [110.61517455253308]
We present DocTr++, a novel unified framework for document image rectification.
We upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing.
We contribute a real-world test set and metrics applicable for evaluating the rectification quality.
arXiv Detail & Related papers (2023-04-18T08:00:54Z) - Multiscale Structure Guided Diffusion for Image Deblurring [24.09642909404091]
Diffusion Probabilistic Models (DPMs) have been employed for image deblurring.
We introduce a simple yet effective multiscale structure guidance as an implicit bias.
We demonstrate more robust deblurring results with fewer artifacts on unseen data.
arXiv Detail & Related papers (2022-12-04T10:40:35Z) - DocScanner: Robust Document Image Rectification with Progressive
Learning [162.03694280524084]
This work presents DocScanner, a new deep network architecture for document image rectification.
DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture.
The iterative refinements make DocScanner converge to a robust and superior performance, and the lightweight recurrent architecture ensures the running efficiency.
arXiv Detail & Related papers (2021-10-28T09:15:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.