Related papers: Task-driven real-world super-resolution of document scans

Task-driven real-world super-resolution of document scans

URL: http://arxiv.org/abs/2506.06953v1
Date: Sun, 08 Jun 2025 00:16:29 GMT
Title: Task-driven real-world super-resolution of document scans
Authors: Maciej Zyrek, Tomasz Tarasiewicz, Jakub Sadel, Aleksandra Krzywon, Michal Kawulok,
Abstract summary: Single-image super-resolution refers to the reconstruction of a high-resolution image from a single low-resolution observation.<n>We introduce a task-driven, multi-task learning framework for training a super-resolution network optimized for optical character recognition tasks.<n>We validate our approach upon the SRResNet architecture, which is a well-established technique for single-image super-resolution.
Score: 41.61731067095584
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Single-image super-resolution refers to the reconstruction of a high-resolution image from a single low-resolution observation. Although recent deep learning-based methods have demonstrated notable success on simulated datasets -- with low-resolution images obtained by degrading and downsampling high-resolution ones -- they frequently fail to generalize to real-world settings, such as document scans, which are affected by complex degradations and semantic variability. In this study, we introduce a task-driven, multi-task learning framework for training a super-resolution network specifically optimized for optical character recognition tasks. We propose to incorporate auxiliary loss functions derived from high-level vision tasks, including text detection using the connectionist text proposal network, text recognition via a convolutional recurrent neural network, keypoints localization using Key.Net, and hue consistency. To balance these diverse objectives, we employ dynamic weight averaging mechanism, which adaptively adjusts the relative importance of each loss term based on its convergence behavior. We validate our approach upon the SRResNet architecture, which is a well-established technique for single-image super-resolution. Experimental evaluations on both simulated and real-world scanned document datasets demonstrate that the proposed approach improves text detection, measured with intersection over union, while preserving overall image fidelity. These findings underscore the value of multi-objective optimization in super-resolution models for bridging the gap between simulated training regimes and practical deployment in real-world scenarios.

Related papers

From Controlled Scenarios to Real-World: Cross-Domain Degradation Pattern Matching for All-in-One Image Restoration [2.997052569698842]
All-in-One Image Restoration (AiOIR) aims to achieve image restoration caused by multiple degradation patterns via a single model with unified parameters.<n>UDAIR framework is proposed to effectively achieve AiOIR by leveraging the learned knowledge from source domain to target domain.<n> Experimental results on 10 open-source datasets demonstrate that UDAIR achieves new state-of-the-art performance for the AiOIR task.
arXiv Detail & Related papers (2025-05-28T12:22:00Z)
Task-driven single-image super-resolution reconstruction of document scans [2.8391355909797644]
We investigate the possibility of employing super-resolution as a preprocessing step to improve optical character recognition from document scans.<n>To achieve that, we propose to train deep networks for single-image super-resolution in a task-driven way to make them better adapted for the purpose of text detection.
arXiv Detail & Related papers (2024-07-12T05:18:26Z)
Learning from Multi-Perception Features for Real-Word Image Super-resolution [87.71135803794519]
We propose a novel SR method called MPF-Net that leverages multiple perceptual features of input images. Our method incorporates a Multi-Perception Feature Extraction (MPFE) module to extract diverse perceptual information. We also introduce a contrastive regularization term (CR) that improves the model's learning capability.
arXiv Detail & Related papers (2023-05-26T07:35:49Z)
Cross-resolution Face Recognition via Identity-Preserving Network and Knowledge Distillation [12.090322373964124]
Cross-resolution face recognition is a challenging problem for modern deep face recognition systems. This paper proposes a new approach that enforces the network to focus on the discriminative information stored in the low-frequency components of a low-resolution image.
arXiv Detail & Related papers (2023-03-15T14:52:46Z)
Real-World Image Super-Resolution by Exclusionary Dual-Learning [98.36096041099906]
Real-world image super-resolution is a practical image restoration problem that aims to obtain high-quality images from in-the-wild input. Deep learning-based methods have achieved promising restoration quality on real-world image super-resolution datasets. We propose Real-World image Super-Resolution by Exclusionary Dual-Learning (RWSR-EDL) to address the feature diversity in perceptual- and L1-based cooperative learning.
arXiv Detail & Related papers (2022-06-06T13:28:15Z)
Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network. We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details. Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z)
Single Image Internal Distribution Measurement Using Non-Local Variational Autoencoder [11.985083962982909]
This paper proposes a novel image-specific solution, namely non-local variational autoencoder (textttNLVAE) textttNLVAE is introduced as a self-supervised strategy that reconstructs high-resolution images using disentangled information from the non-local neighbourhood. Experimental results from seven benchmark datasets demonstrate the effectiveness of the textttNLVAE model.
arXiv Detail & Related papers (2022-04-02T18:43:55Z)
High-resolution Iterative Feedback Network for Camouflaged Object Detection [128.893782016078]
Spotting camouflaged objects that are visually assimilated into the background is tricky for object detection algorithms. We aim to extract the high-resolution texture details to avoid the detail degradation that causes blurred vision in edges and boundaries. We introduce a novel HitNet to refine the low-resolution representations by high-resolution features in an iterative feedback manner.
arXiv Detail & Related papers (2022-03-22T11:20:21Z)
Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task. We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network. Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.