Task-driven single-image super-resolution reconstruction of document scans
- URL: http://arxiv.org/abs/2407.08993v1
- Date: Fri, 12 Jul 2024 05:18:26 GMT
- Title: Task-driven single-image super-resolution reconstruction of document scans
- Authors: Maciej Zyrek, Michal Kawulok,
- Abstract summary: We investigate the possibility of employing super-resolution as a preprocessing step to improve optical character recognition from document scans.
To achieve that, we propose to train deep networks for single-image super-resolution in a task-driven way to make them better adapted for the purpose of text detection.
- Score: 2.8391355909797644
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Super-resolution reconstruction is aimed at generating images of high spatial resolution from low-resolution observations. State-of-the-art super-resolution techniques underpinned with deep learning allow for obtaining results of outstanding visual quality, but it is seldom verified whether they constitute a valuable source for specific computer vision applications. In this paper, we investigate the possibility of employing super-resolution as a preprocessing step to improve optical character recognition from document scans. To achieve that, we propose to train deep networks for single-image super-resolution in a task-driven way to make them better adapted for the purpose of text detection. As problems limited to a specific task are heavily ill-posed, we introduce a multi-task loss function that embraces components related with text detection coupled with those guided by image similarity. The obtained results reported in this paper are encouraging and they constitute an important step towards real-world super-resolution of document images.
Related papers
- Reconstructing Interpretable Features in Computational Super-Resolution microscopy via Regularized Latent Search [2.7194314957925094]
Supervised deep learning approaches can artificially increase the resolution of microscopy images by learning a mapping between two image resolutions or modalities.
Recent methods based on GAN latent search offered a drastic increase in resolution without the need of paired images.
Here, we propose a robust super-resolution method based on regularized latent search(RLS) that offers an actionable balance between fidelity to the ground-truth and realism of the recovered image.
arXiv Detail & Related papers (2024-05-29T14:20:46Z) - Ground-A-Score: Scaling Up the Score Distillation for Multi-Attribute Editing [49.419619882284906]
Ground-A-Score is a powerful model-agnostic image editing method by incorporating grounding during score distillation.
The selective application with a new penalty coefficient and contrastive loss helps to precisely target editing areas.
Both qualitative assessments and quantitative analyses confirm that Ground-A-Score successfully adheres to the intricate details of extended and multifaceted prompts.
arXiv Detail & Related papers (2024-03-20T12:40:32Z) - Super-Resolving Face Image by Facial Parsing Information [52.1267613768555]
Face super-resolution is a technology that transforms a low-resolution face image into the corresponding high-resolution one.
We build a novel parsing map guided face super-resolution network which extracts the face prior from low-resolution face image.
High-resolution features contain more precise spatial information while low-resolution features provide strong contextual information.
arXiv Detail & Related papers (2023-04-06T08:19:03Z) - Cross-resolution Face Recognition via Identity-Preserving Network and
Knowledge Distillation [12.090322373964124]
Cross-resolution face recognition is a challenging problem for modern deep face recognition systems.
This paper proposes a new approach that enforces the network to focus on the discriminative information stored in the low-frequency components of a low-resolution image.
arXiv Detail & Related papers (2023-03-15T14:52:46Z) - Rethinking Super-Resolution as Text-Guided Details Generation [21.695227836312835]
We propose a Text-Guided Super-Resolution (TGSR) framework, which can effectively utilize the information from the text and image modalities.
The proposed TGSR could generate HR image details that match the text descriptions through a coarse-to-fine process.
arXiv Detail & Related papers (2022-07-14T01:46:38Z) - Hierarchical Similarity Learning for Aliasing Suppression Image
Super-Resolution [64.15915577164894]
A hierarchical image super-resolution network (HSRNet) is proposed to suppress the influence of aliasing.
HSRNet achieves better quantitative and visual performance than other works, and remits the aliasing more effectively.
arXiv Detail & Related papers (2022-06-07T14:55:32Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - HIME: Efficient Headshot Image Super-Resolution with Multiple Exemplars [11.81364643562714]
We propose an efficient Headshot Image Super-Resolution with Multiple Exemplars network (HIME) method.
Compared with previous methods, our network can effectively handle the misalignment between the input and the reference.
We also propose a correlation loss that provides a rich representation of the local texture in a controllable spatial range.
arXiv Detail & Related papers (2022-03-28T16:13:28Z) - Multi Scale Identity-Preserving Image-to-Image Translation Network for
Low-Resolution Face Recognition [7.6702700993064115]
We propose an identity-preserving end-to-end image-to-image translation deep neural network.
It is capable of super-resolving very low-resolution faces to their high-resolution counterparts while preserving identity-related information.
arXiv Detail & Related papers (2020-10-23T09:21:06Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z) - Gated Fusion Network for Degraded Image Super Resolution [78.67168802945069]
We propose a dual-branch convolutional neural network to extract base features and recovered features separately.
By decomposing the feature extraction step into two task-independent streams, the dual-branch model can facilitate the training process.
arXiv Detail & Related papers (2020-03-02T13:28:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.