ESTISR: Adapting Efficient Scene Text Image Super-resolution for
Real-Scenes
- URL: http://arxiv.org/abs/2306.02443v1
- Date: Sun, 4 Jun 2023 19:14:44 GMT
- Title: ESTISR: Adapting Efficient Scene Text Image Super-resolution for
Real-Scenes
- Authors: Minghao Fu, Xin Man, Yihan Xu, Jie Shao
- Abstract summary: Scene text image super-resolution (STISR) has yielded remarkable improvements in accurately recognizing scene text.
We propose a novel Efficient Scene Text Image Super-resolution (ESTISR) Network for resource-limited deployment platform.
ESTISR consistently outperforms current methods in terms of actual running time and peak memory consumption.
- Score: 25.04435367653037
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While scene text image super-resolution (STISR) has yielded remarkable
improvements in accurately recognizing scene text, prior methodologies have
placed excessive emphasis on optimizing performance, rather than paying due
attention to efficiency - a crucial factor in ensuring deployment of the
STISR-STR pipeline. In this work, we propose a novel Efficient Scene Text Image
Super-resolution (ESTISR) Network for resource-limited deployment platform.
ESTISR's functionality primarily depends on two critical components: a
CNN-based feature extractor and an efficient self-attention mechanism used for
decoding low-resolution images. We designed a re-parameterized inverted
residual block specifically suited for resource-limited circumstances as the
feature extractor. Meanwhile, we proposed a novel self-attention mechanism,
softmax shrinking, based on a kernel-based approach. This innovative technique
offers linear complexity while also naturally incorporating discriminating
low-level features into the self-attention structure. Extensive experiments on
TextZoom show that ESTISR retains a high image restoration quality and improved
STR accuracy of low-resolution images. Furthermore, ESTISR consistently
outperforms current methods in terms of actual running time and peak memory
consumption, while achieving a better trade-off between performance and
efficiency.
Related papers
- Realistic Extreme Image Rescaling via Generative Latent Space Learning [51.85790402171696]
We propose a novel framework called Latent Space Based Image Rescaling (LSBIR) for extreme image rescaling tasks.
LSBIR effectively leverages powerful natural image priors learned by a pre-trained text-to-image diffusion model to generate realistic HR images.
In the first stage, a pseudo-invertible encoder-decoder models the bidirectional mapping between the latent features of the HR image and the target-sized LR image.
In the second stage, the reconstructed features from the first stage are refined by a pre-trained diffusion model to generate more faithful and visually pleasing details.
arXiv Detail & Related papers (2024-08-17T09:51:42Z) - PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution [18.936806519546508]
Scene text image super-resolution (STISR) aims at simultaneously increasing the resolution and readability of low-resolution scene text images.
Two factors in scene text images, visual structure and semantic information, affect the recognition performance significantly.
This paper proposes a Prior-Enhanced Attention Network (PEAN) to mitigate the effects from these factors.
arXiv Detail & Related papers (2023-11-29T08:11:20Z) - Swift Parameter-free Attention Network for Efficient Super-Resolution [8.365929625909509]
Single Image Super-Resolution is a crucial task in low-level computer vision.
We propose the Swift.
parameter-free Attention Network (SPAN), which balances parameter count, inference speed, and image quality.
We evaluate SPAN on multiple benchmarks, showing that it outperforms existing efficient super-resolution models in terms of both image quality and inference speed.
arXiv Detail & Related papers (2023-11-21T18:30:40Z) - RBSR: Efficient and Flexible Recurrent Network for Burst
Super-Resolution [57.98314517861539]
Burst super-resolution (BurstSR) aims at reconstructing a high-resolution (HR) image from a sequence of low-resolution (LR) and noisy images.
In this paper, we suggest fusing cues frame-by-frame with an efficient and flexible recurrent network.
arXiv Detail & Related papers (2023-06-30T12:14:13Z) - CiaoSR: Continuous Implicit Attention-in-Attention Network for
Arbitrary-Scale Image Super-Resolution [158.2282163651066]
This paper proposes a continuous implicit attention-in-attention network, called CiaoSR.
We explicitly design an implicit attention network to learn the ensemble weights for the nearby local features.
We embed a scale-aware attention in this implicit attention network to exploit additional non-local information.
arXiv Detail & Related papers (2022-12-08T15:57:46Z) - Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text
Spotting [49.33891486324731]
We propose a novel cost-efficient Dynamic Low-resolution Distillation (DLD) text spotting framework.
It aims to infer images in different small but recognizable resolutions and achieve a better balance between accuracy and efficiency.
The proposed method can be optimized end-to-end and adopted in any current text spotting framework to improve the practicability.
arXiv Detail & Related papers (2022-07-14T06:49:59Z) - Rethinking Super-Resolution as Text-Guided Details Generation [21.695227836312835]
We propose a Text-Guided Super-Resolution (TGSR) framework, which can effectively utilize the information from the text and image modalities.
The proposed TGSR could generate HR image details that match the text descriptions through a coarse-to-fine process.
arXiv Detail & Related papers (2022-07-14T01:46:38Z) - Hierarchical Similarity Learning for Aliasing Suppression Image
Super-Resolution [64.15915577164894]
A hierarchical image super-resolution network (HSRNet) is proposed to suppress the influence of aliasing.
HSRNet achieves better quantitative and visual performance than other works, and remits the aliasing more effectively.
arXiv Detail & Related papers (2022-06-07T14:55:32Z) - Residual Local Feature Network for Efficient Super-Resolution [20.62809970985125]
In this work, we propose a novel Residual Local Feature Network (RLFN)
The main idea is using three convolutional layers for residual local feature learning to simplify feature aggregation.
In addition, we won the first place in the runtime track of the NTIRE 2022 efficient super-resolution challenge.
arXiv Detail & Related papers (2022-05-16T08:46:34Z) - Scene Text Image Super-Resolution in the Wild [112.90416737357141]
Low-resolution text images are often seen in natural scenes such as documents captured by mobile phones.
Previous single image super-resolution (SISR) methods are trained on synthetic low-resolution images.
We pro-pose a real scene text SR dataset, termed TextZoom.
It contains paired real low-resolution and high-resolution images captured by cameras with different focal length in the wild.
arXiv Detail & Related papers (2020-05-07T09:18:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.