Related papers: CharFormer: A Glyph Fusion based Attentive Framework for High-precision Character Image Denoising

CharFormer: A Glyph Fusion based Attentive Framework for High-precision Character Image Denoising

URL: http://arxiv.org/abs/2207.07798v2
Date: Tue, 19 Jul 2022 17:46:58 GMT
Title: CharFormer: A Glyph Fusion based Attentive Framework for High-precision Character Image Denoising
Authors: Daqian Shi, Xiaolei Diao, Lida Shi, Hao Tang, Yang Chi, Chuntao Li, Hao Xu
Abstract summary: We introduce a novel framework based on glyph fusion and attention mechanisms, i.e., CharFormer, for precisely recovering character images. Unlike existing frameworks, CharFormer introduces a parallel target task for capturing additional information and injecting it into the image denoising backbone. We utilize attention-based networks for global-local feature interaction, which will help to deal with blind denoising and enhance denoising performance.
Score: 10.53596428004378
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Degraded images commonly exist in the general sources of character images, leading to unsatisfactory character recognition results. Existing methods have dedicated efforts to restoring degraded character images. However, the denoising results obtained by these methods do not appear to improve character recognition performance. This is mainly because current methods only focus on pixel-level information and ignore critical features of a character, such as its glyph, resulting in character-glyph damage during the denoising process. In this paper, we introduce a novel generic framework based on glyph fusion and attention mechanisms, i.e., CharFormer, for precisely recovering character images without changing their inherent glyphs. Unlike existing frameworks, CharFormer introduces a parallel target task for capturing additional information and injecting it into the image denoising backbone, which will maintain the consistency of character glyphs during character image denoising. Moreover, we utilize attention-based networks for global-local feature interaction, which will help to deal with blind denoising and enhance denoising performance. We compare CharFormer with state-of-the-art methods on multiple datasets. The experimental results show the superiority of CharFormer quantitatively and qualitatively.

Related papers

Efficient and Robust Remote Sensing Image Denoising Using Randomized Approximation of Geodesics' Gramian on the Manifold Underlying the Patch Space [2.56711111236449]
We present a robust remote sensing image denoising method that doesn't require additional training samples. The method asserts a unique emphasis on each color channel during denoising so the three denoised channels are merged to produce the final image.
arXiv Detail & Related papers (2025-04-15T02:46:05Z)
AMNS: Attention-Weighted Selective Mask and Noise Label Suppression for Text-to-Image Person Retrieval [3.591122855617648]
Under-correlated and false-correlated problems arise for image-text pairs due to poor image quality and mislabeling. We propose a new noise label suppression method and alleviate the problem generated by random mask.
arXiv Detail & Related papers (2024-09-10T10:08:01Z)
Cross-Image Attention for Zero-Shot Appearance Transfer [68.43651329067393]
We introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images. We harness three mechanisms that either manipulate the noisy latent codes or the model's internal representations throughout the denoising process. Experiments show that our method is effective across a wide range of object categories and is robust to variations in shape, size, and viewpoint.
arXiv Detail & Related papers (2023-11-06T18:33:24Z)
ASPIRE: Language-Guided Data Augmentation for Improving Robustness Against Spurious Correlations [43.323791505213634]
ASPIRE (Language-guided Data Augmentation for SPurIous correlation REmoval) is a solution for supplementing the training dataset with images without spurious features. It can generate non-spurious images without requiring any group labeling or existing non-spurious images in the training set. It improves the worst-group classification accuracy of prior methods by 1% - 38%.
arXiv Detail & Related papers (2023-08-19T20:18:15Z)
ALIP: Adaptive Language-Image Pre-training with Synthetic Caption [78.93535202851278]
Contrastive Language-Image Pre-training (CLIP) has significantly boosted the performance of various vision-language tasks. The presence of intrinsic noise and unmatched image-text pairs in web data can potentially affect the performance of representation learning. We propose an Adaptive Language-Image Pre-training (ALIP), a bi-path model that integrates supervision from both raw text and synthetic caption.
arXiv Detail & Related papers (2023-08-16T15:19:52Z)
Discriminative Class Tokens for Text-to-Image Diffusion Models [107.98436819341592]
We propose a non-invasive fine-tuning technique that capitalizes on the expressive potential of free-form text. Our method is fast compared to prior fine-tuning methods and does not require a collection of in-class images. We evaluate our method extensively, showing that the generated images are: (i) more accurate and of higher quality than standard diffusion models, (ii) can be used to augment training data in a low-resource setting, and (iii) reveal information about the data used to train the guiding classifier.
arXiv Detail & Related papers (2023-03-30T05:25:20Z)
Masked Image Training for Generalizable Deep Image Denoising [53.03126421917465]
We present a novel approach to enhance the generalization performance of denoising networks. Our method involves masking random pixels of the input image and reconstructing the missing information during training. Our approach exhibits better generalization ability than other deep learning models and is directly applicable to real-world scenarios.
arXiv Detail & Related papers (2023-03-23T09:33:44Z)
Deep Semantic Statistics Matching (D2SM) Denoising Network [70.01091467628068]
We introduce the Deep Semantic Statistics Matching (D2SM) Denoising Network. It exploits semantic features of pretrained classification networks, then it implicitly matches the probabilistic distribution of clear images at the semantic feature space. By learning to preserve the semantic distribution of denoised images, we empirically find our method significantly improves the denoising capabilities of networks.
arXiv Detail & Related papers (2022-07-19T14:35:42Z)
Dynamic Attentive Graph Learning for Image Restoration [6.289143409131908]
We propose a dynamic attentive graph learning model (DAGL) to explore the dynamic non-local property on patch level for image restoration. Our DAGL can produce state-of-the-art results with superior accuracy and visual quality.
arXiv Detail & Related papers (2021-09-14T12:19:15Z)
Synergy Between Semantic Segmentation and Image Denoising via Alternate Boosting [102.19116213923614]
We propose a boosting network to perform denoising and segmentation alternately. We observe that not only denoising helps combat the drop of segmentation accuracy due to noise, but also pixel-wise semantic information boosts the capability of denoising. Experimental results show that the denoised image quality is improved substantially and the segmentation accuracy is improved to close to that of clean images.
arXiv Detail & Related papers (2021-02-24T06:48:45Z)
Image Denoising Using the Geodesics' Gramian of the Manifold Underlying Patch-Space [1.7767466724342067]
We propose a novel and computationally efficient image denoising method that is capable of producing accurate images. To preserve image smoothness, this method inputs patches partitioned from the image rather than pixels. We validate the performance of this method against benchmark image processing methods.
arXiv Detail & Related papers (2020-10-14T04:07:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.