Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text
Image Super-Resolution
- URL: http://arxiv.org/abs/2309.08919v1
- Date: Sat, 16 Sep 2023 08:12:12 GMT
- Title: Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text
Image Super-Resolution
- Authors: Wenyu Zhang, Xin Deng, Baojun Jia, Xingtong Yu, Yifan Chen, jin Ma,
Qing Ding, Xinming Zhang
- Abstract summary: We propose the Pixel Adapter Module (PAM) based on graph attention to address pixel distortion caused by upsampling.
The PAM effectively captures local structural information by allowing each pixel to interact with its neighbors and update features.
We demonstrate that our proposed method generates high-quality super-resolution images, surpassing existing methods in recognition accuracy.
- Score: 22.60056946339325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current Scene text image super-resolution approaches primarily focus on
extracting robust features, acquiring text information, and complex training
strategies to generate super-resolution images. However, the upsampling module,
which is crucial in the process of converting low-resolution images to
high-resolution ones, has received little attention in existing works. To
address this issue, we propose the Pixel Adapter Module (PAM) based on graph
attention to address pixel distortion caused by upsampling. The PAM effectively
captures local structural information by allowing each pixel to interact with
its neighbors and update features. Unlike previous graph attention mechanisms,
our approach achieves 2-3 orders of magnitude improvement in efficiency and
memory utilization by eliminating the dependency on sparse adjacency matrices
and introducing a sliding window approach for efficient parallel computation.
Additionally, we introduce the MLP-based Sequential Residual Block (MSRB) for
robust feature extraction from text images, and a Local Contour Awareness loss
($\mathcal{L}_{lca}$) to enhance the model's perception of details.
Comprehensive experiments on TextZoom demonstrate that our proposed method
generates high-quality super-resolution images, surpassing existing methods in
recognition accuracy. For single-stage and multi-stage strategies, we achieved
improvements of 0.7\% and 2.6\%, respectively, increasing the performance from
52.6\% and 53.7\% to 53.3\% and 56.3\%. The code is available at
https://github.com/wenyu1009/RTSRN.
Related papers
- Accelerating Image Super-Resolution Networks with Pixel-Level Classification [29.010136088811137]
Pixel-level for Single Image SuperResolution is a novel method designed to distribute computational resources adaptively at the pixel level.
Our method allows for performance and computational cost balance during inference without re-training.
arXiv Detail & Related papers (2024-07-31T08:53:10Z) - Pixel-Inconsistency Modeling for Image Manipulation Localization [59.968362815126326]
Digital image forensics plays a crucial role in image authentication and manipulation localization.
This paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts.
Experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints.
arXiv Detail & Related papers (2023-09-30T02:54:51Z) - Improving Pixel-based MIM by Reducing Wasted Modeling Capability [77.99468514275185]
We propose a new method that explicitly utilizes low-level features from shallow layers to aid pixel reconstruction.
To the best of our knowledge, we are the first to systematically investigate multi-level feature fusion for isotropic architectures.
Our method yields significant performance gains, such as 1.2% on fine-tuning, 2.8% on linear probing, and 2.6% on semantic segmentation.
arXiv Detail & Related papers (2023-08-01T03:44:56Z) - Towards Robust Scene Text Image Super-resolution via Explicit Location
Enhancement [59.66539728681453]
Scene text image super-resolution (STISR) aims to improve image quality while boosting downstream scene text recognition accuracy.
Most existing methods treat the foreground (character regions) and background (non-character regions) equally in the forward process.
We propose a novel method LEMMA that explicitly models character regions to produce high-level text-specific guidance for super-resolution.
arXiv Detail & Related papers (2023-07-19T05:08:47Z) - Guided Linear Upsampling [8.819059777836628]
Guided upsampling is an effective approach for accelerating high-resolution image processing.
Our method can better preserve detail effects while suppressing artifacts such as bleeding and blurring.
We demonstrate the advantages of our method for both interactive image editing and real-time high-resolution video processing.
arXiv Detail & Related papers (2023-07-13T08:04:24Z) - Super-Resolution of License Plate Images Using Attention Modules and
Sub-Pixel Convolution Layers [3.8831062015253055]
We introduce a Single-Image Super-Resolution (SISR) approach to enhance the detection of structural and textural features in surveillance images.
Our approach incorporates sub-pixel convolution layers and a loss function that uses an Optical Character Recognition (OCR) model for feature extraction.
Our results show that our approach for reconstructing these low-resolution synthesized images outperforms existing ones in both quantitative and qualitative measures.
arXiv Detail & Related papers (2023-05-27T00:17:19Z) - DBAT: Dynamic Backward Attention Transformer for Material Segmentation
with Cross-Resolution Patches [8.812837829361923]
We propose the Dynamic Backward Attention Transformer (DBAT) to aggregate cross-resolution features.
Experiments show that our DBAT achieves an accuracy of 86.85%, which is the best performance among state-of-the-art real-time models.
We further align features to semantic labels, performing network dissection, to infer that the proposed model can extract material-related features better than other methods.
arXiv Detail & Related papers (2023-05-06T03:47:20Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - Hybrid Pixel-Unshuffled Network for Lightweight Image Super-Resolution [64.54162195322246]
Convolutional neural network (CNN) has achieved great success on image super-resolution (SR)
Most deep CNN-based SR models take massive computations to obtain high performance.
We propose a novel Hybrid Pixel-Unshuffled Network (HPUN) by introducing an efficient and effective downsampling module into the SR task.
arXiv Detail & Related papers (2022-03-16T20:10:41Z) - Generating Superpixels for High-resolution Images with Decoupled Patch
Calibration [82.21559299694555]
Patch Networks (PCNet) is designed to efficiently and accurately implement high-resolution superpixel segmentation.
DPC takes a local patch from the high-resolution images and dynamically generates a binary mask to impose the network to focus on region boundaries.
In particular, DPC takes a local patch from the high-resolution images and dynamically generates a binary mask to impose the network to focus on region boundaries.
arXiv Detail & Related papers (2021-08-19T10:33:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.