ShaDocFormer: A Shadow-Attentive Threshold Detector With Cascaded Fusion Refiner for Document Shadow Removal
- URL: http://arxiv.org/abs/2309.06670v4
- Date: Thu, 21 Mar 2024 08:30:54 GMT
- Title: ShaDocFormer: A Shadow-Attentive Threshold Detector With Cascaded Fusion Refiner for Document Shadow Removal
- Authors: Weiwen Chen, Yingtie Lei, Shenghong Luo, Ziyang Zhou, Mingxian Li, Chi-Man Pun,
- Abstract summary: We propose a Transformer-based architecture that integrates traditional methodologies and deep learning techniques to tackle the problem of document shadow removal.
The ShaDocFormer architecture comprises two components: the Shadow-attentive Threshold Detector (STD) and the Cascaded Fusion Refiner (CFR)
- Score: 26.15238399758745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Document shadow is a common issue that arises when capturing documents using mobile devices, which significantly impacts readability. Current methods encounter various challenges, including inaccurate detection of shadow masks and estimation of illumination. In this paper, we propose ShaDocFormer, a Transformer-based architecture that integrates traditional methodologies and deep learning techniques to tackle the problem of document shadow removal. The ShaDocFormer architecture comprises two components: the Shadow-attentive Threshold Detector (STD) and the Cascaded Fusion Refiner (CFR). The STD module employs a traditional thresholding technique and leverages the attention mechanism of the Transformer to gather global information, thereby enabling precise detection of shadow masks. The cascaded and aggregative structure of the CFR module facilitates a coarse-to-fine restoration process for the entire image. As a result, ShaDocFormer excels in accurately detecting and capturing variations in both shadow and illumination, thereby enabling effective removal of shadows. Extensive experiments demonstrate that ShaDocFormer outperforms current state-of-the-art methods in both qualitative and quantitative measurements.
Related papers
- ShadowMaskFormer: Mask Augmented Patch Embeddings for Shadow Removal [13.983288991595614]
We propose a transformer-based framework with a novel patch embedding that is tailored for shadow removal, dubbed ShadowMaskFormer.
Specifically, we present a simple and effective mask-augmented patch embedding to integrate shadow information and promote the model's emphasis on acquiring knowledge for shadow regions.
arXiv Detail & Related papers (2024-04-29T05:17:33Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - DocDeshadower: Frequency-aware Transformer for Document Shadow Removal [49.107557554811144]
DocDeshadower is a multi-frequency Transformer-based model built on Laplacian Pyramid.
We decompose the shadow image into different frequency bands using Laplacian Pyramid.
Attention-Aggregation Network is designed to remove shadows in the low-frequency part of the image.
Gated Multi-scale Fusion Transformer refines the entire image at a global scale with its large perceptive field.
arXiv Detail & Related papers (2023-07-28T05:35:37Z) - Structure-Informed Shadow Removal Networks [67.57092870994029]
Existing deep learning-based shadow removal methods still produce images with shadow remnants.
We propose a novel structure-informed shadow removal network (StructNet) to leverage the image-structure information to address the shadow remnant problem.
Our method outperforms existing shadow removal methods, and our StructNet can be integrated with existing methods to improve them further.
arXiv Detail & Related papers (2023-01-09T06:31:52Z) - ShaDocNet: Learning Spatial-Aware Tokens in Transformer for Document
Shadow Removal [53.01990632289937]
We propose a Transformer-based model for document shadow removal.
It uses shadow context encoding and decoding in both shadow and shadow-free regions.
arXiv Detail & Related papers (2022-11-30T01:46:29Z) - SpA-Former: Transformer image shadow detection and removal via spatial
attention [8.643096072885909]
We propose an end-to-end SpA-Former to recover a shadow-free image from a single shaded image.
Unlike traditional methods that require two steps for shadow detection and then shadow removal, the SpA-Former unifies these steps into one.
arXiv Detail & Related papers (2022-06-22T08:30:22Z) - DocScanner: Robust Document Image Rectification with Progressive
Learning [162.03694280524084]
This work presents DocScanner, a new deep network architecture for document image rectification.
DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture.
The iterative refinements make DocScanner converge to a robust and superior performance, and the lightweight recurrent architecture ensures the running efficiency.
arXiv Detail & Related papers (2021-10-28T09:15:02Z) - R2D: Learning Shadow Removal to Enhance Fine-Context Shadow Detection [64.10636296274168]
Current shadow detection methods perform poorly when detecting shadow regions that are small, unclear or have blurry edges.
We propose a new method called Restore to Detect (R2D), where a deep neural network is trained for restoration (shadow removal)
We show that our proposed method R2D improves the shadow detection performance while being able to detect fine context better compared to the other recent methods.
arXiv Detail & Related papers (2021-09-20T15:09:22Z) - Temporal Feature Warping for Video Shadow Detection [30.82493923485278]
We propose a simple but powerful method to better aggregate information temporally.
We use an optical flow based warping module to align and then combine features between frames.
We apply this warping module across multiple deep-network layers to retrieve information from neighboring frames including both local details and high-level semantic information.
arXiv Detail & Related papers (2021-07-29T19:12:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.