One-stage Low-resolution Text Recognition with High-resolution Knowledge
Transfer
- URL: http://arxiv.org/abs/2308.02770v1
- Date: Sat, 5 Aug 2023 02:33:45 GMT
- Title: One-stage Low-resolution Text Recognition with High-resolution Knowledge
Transfer
- Authors: Hang Guo, Tao Dai, Mingyan Zhu, Guanghao Meng, Bin Chen, Zhi Wang,
Shu-Tao Xia
- Abstract summary: Current solutions for low-resolution text recognition typically rely on a two-stage pipeline.
We propose an efficient and effective knowledge distillation framework to achieve multi-level knowledge transfer.
Experiments show that the proposed one-stage pipeline significantly outperforms super-resolution based two-stage frameworks.
- Score: 53.02254290682613
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recognizing characters from low-resolution (LR) text images poses a
significant challenge due to the information deficiency as well as the noise
and blur in low-quality images. Current solutions for low-resolution text
recognition (LTR) typically rely on a two-stage pipeline that involves
super-resolution as the first stage followed by the second-stage recognition.
Although this pipeline is straightforward and intuitive, it has to use an
additional super-resolution network, which causes inefficiencies during
training and testing. Moreover, the recognition accuracy of the second stage
heavily depends on the reconstruction quality of the first stage, causing
ineffectiveness. In this work, we attempt to address these challenges from a
novel perspective: adapting the recognizer to low-resolution inputs by
transferring the knowledge from the high-resolution. Guided by this idea, we
propose an efficient and effective knowledge distillation framework to achieve
multi-level knowledge transfer. Specifically, the visual focus loss is proposed
to extract the character position knowledge with resolution gap reduction and
character region focus, the semantic contrastive loss is employed to exploit
the contextual semantic knowledge with contrastive learning, and the soft
logits loss facilitates both local word-level and global sequence-level
learning from the soft teacher label. Extensive experiments show that the
proposed one-stage pipeline significantly outperforms super-resolution based
two-stage frameworks in terms of effectiveness and efficiency, accompanied by
favorable robustness. Code is available at https://github.com/csguoh/KD-LTR.
Related papers
- Exploring Deep Learning Image Super-Resolution for Iris Recognition [50.43429968821899]
We propose the use of two deep learning single-image super-resolution approaches: Stacked Auto-Encoders (SAE) and Convolutional Neural Networks (CNN)
We validate the methods with a database of 1.872 near-infrared iris images with quality assessment and recognition experiments showing the superiority of deep learning approaches over the compared algorithms.
arXiv Detail & Related papers (2023-11-02T13:57:48Z) - An Enhanced Low-Resolution Image Recognition Method for Traffic
Environments [3.018656336329545]
Low-resolution images suffer from small size, low quality, and lack of detail, leading to a decrease in the accuracy of traditional neural network recognition algorithms.
This paper introduces a dual-branch residual network structure that leverages the basic architecture of residual networks and a common feature subspace algorithm.
It incorporates the utilization of intermediate-layer features to enhance the accuracy of low-resolution image recognition.
arXiv Detail & Related papers (2023-09-28T12:38:31Z) - Cross-resolution Face Recognition via Identity-Preserving Network and
Knowledge Distillation [12.090322373964124]
Cross-resolution face recognition is a challenging problem for modern deep face recognition systems.
This paper proposes a new approach that enforces the network to focus on the discriminative information stored in the low-frequency components of a low-resolution image.
arXiv Detail & Related papers (2023-03-15T14:52:46Z) - Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text
Spotting [49.33891486324731]
We propose a novel cost-efficient Dynamic Low-resolution Distillation (DLD) text spotting framework.
It aims to infer images in different small but recognizable resolutions and achieve a better balance between accuracy and efficiency.
The proposed method can be optimized end-to-end and adopted in any current text spotting framework to improve the practicability.
arXiv Detail & Related papers (2022-07-14T06:49:59Z) - Impact of a DCT-driven Loss in Attention-based Knowledge-Distillation
for Scene Recognition [64.29650787243443]
We propose and analyse the use of a 2D frequency transform of the activation maps before transferring them.
This strategy enhances knowledge transferability in tasks such as scene recognition.
We publicly release the training and evaluation framework used along this paper at http://www.vpu.eps.uam.es/publications/DCTBasedKDForSceneRecognition.
arXiv Detail & Related papers (2022-05-04T11:05:18Z) - Activation to Saliency: Forming High-Quality Labels for Unsupervised
Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues.
No human annotations are involved in our framework during the whole training process.
Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z) - IFR: Iterative Fusion Based Recognizer For Low Quality Scene Text
Recognition [20.741958198581173]
We propose an Iterative Fusion based Recognizer (IFR) for low quality scene text recognition.
IFR contains two branches which focus on scene text recognition and low quality scene text image recovery respectively.
A feature fusion module is proposed to strengthen the feature representation of the two branches.
arXiv Detail & Related papers (2021-08-13T10:45:01Z) - Disentangled High Quality Salient Object Detection [8.416690566816305]
We propose a novel deep learning framework for high-resolution salient object detection (SOD)
It disentangles the task into a low-resolution saliency classification network (LRSCN) and a high-resolution refinement network (HRRN)
arXiv Detail & Related papers (2021-08-08T02:14:15Z) - Interpretable Detail-Fidelity Attention Network for Single Image
Super-Resolution [89.1947690981471]
We propose a purposeful and interpretable detail-fidelity attention network to progressively process smoothes and details in divide-and-conquer manner.
Particularly, we propose a Hessian filtering for interpretable feature representation which is high-profile for detail inference.
Experiments demonstrate that the proposed methods achieve superior performances over the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-28T08:31:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.