Text detection and recognition based on a lensless imaging system
- URL: http://arxiv.org/abs/2210.04244v1
- Date: Sun, 9 Oct 2022 12:31:09 GMT
- Title: Text detection and recognition based on a lensless imaging system
- Authors: Yinger Zhang, Zhouyi Wu, Peiying Lin, Yuting Wu, Lusong Wei, Zhengjie
Huang, and Jiangtao Huangfu
- Abstract summary: A framework of deep-learning-based pipeline structure was built to recognize text with three steps from raw data captured by lensless cameras.
This study reasonably demonstrates text detection and recognition tasks in the lensless camera system.
- Score: 6.769458974198602
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lensless cameras are characterized by several advantages (e.g.,
miniaturization, ease of manufacture, and low cost) as compared with
conventional cameras. However, they have not been extensively employed due to
their poor image clarity and low image resolution, especially for tasks that
have high requirements on image quality and details such as text detection and
text recognition. To address the problem, a framework of deep-learning-based
pipeline structure was built to recognize text with three steps from raw data
captured by employing lensless cameras. This pipeline structure consisted of
the lensless imaging model U-Net, the text detection model connectionist text
proposal network (CTPN), and the text recognition model convolutional recurrent
neural network (CRNN). Compared with the method focusing only on image
reconstruction, UNet in the pipeline was able to supplement the imaging details
by enhancing factors related to character categories in the reconstruction
process, so the textual information can be more effectively detected and
recognized by CTPN and CRNN with fewer artifacts and high-clarity reconstructed
lensless images. By performing experiments on datasets of different
complexities, the applicability to text detection and recognition on lensless
cameras was verified. This study reasonably demonstrates text detection and
recognition tasks in the lensless camera system,and develops a basic method for
novel applications.
Related papers
- UNIT: Unifying Image and Text Recognition in One Vision Encoder [51.140564856352825]
UNIT is a novel training framework aimed at UNifying Image and Text recognition within a single model.
We show that UNIT significantly outperforms existing methods on document-related tasks.
Notably, UNIT retains the original vision encoder architecture, making it cost-free in terms of inference and deployment.
arXiv Detail & Related papers (2024-09-06T08:02:43Z) - Recognition-Guided Diffusion Model for Scene Text Image Super-Resolution [15.391125077873745]
Scene Text Image Super-Resolution (STISR) aims to enhance the resolution and legibility of text within low-resolution (LR) images.
Previous methods predominantly employ discriminative Convolutional Neural Networks (CNNs) augmented with diverse forms of text guidance.
We introduce RGDiffSR, a Recognition-Guided Diffusion model for scene text image Super-Resolution, which exhibits great generative diversity and fidelity even in challenging scenarios.
arXiv Detail & Related papers (2023-11-22T11:10:45Z) - Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language.
We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z) - Iris super-resolution using CNNs: is photo-realism important to iris
recognition? [67.42500312968455]
Single image super-resolution techniques are emerging, especially with the use of convolutional neural networks (CNNs)
In this work, the authors explore single image super-resolution using CNNs for iris recognition.
They validate their approach on a database of 1.872 near infrared iris images and on a mobile phone image database.
arXiv Detail & Related papers (2022-10-24T11:19:18Z) - Scene Text Image Super-Resolution via Content Perceptual Loss and
Criss-Cross Transformer Blocks [48.81850740907517]
We present TATSR, a Text-Aware Text Super-Resolution framework.
It effectively learns the unique text characteristics using Criss-Cross Transformer Blocks (CCTBs) and a novel Content Perceptual (CP) Loss.
It outperforms state-of-the-art methods in terms of both recognition accuracy and human perception.
arXiv Detail & Related papers (2022-10-13T11:48:45Z) - A Text Attention Network for Spatial Deformation Robust Scene Text Image
Super-resolution [13.934846626570286]
Scene text image super-resolution aims to increase the resolution and readability of the text in low-resolution images.
It remains difficult to reconstruct high-resolution images for spatially deformed texts, especially rotated and curve-shaped ones.
We propose a CNN based Text ATTention network (TATT) to address this problem.
arXiv Detail & Related papers (2022-03-17T15:28:29Z) - TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance [15.72669617789124]
Scene text recognition (STR) is an important bridge between images and text.
Recent methods use a frozen initial embedding to guide the decoder to decode the features to text, leading to a loss of accuracy.
We propose a novel architecture for text recognition, named TRansformer-based text recognizer with Initial embedding Guidance (TRIG)
arXiv Detail & Related papers (2021-11-16T09:10:39Z) - IFR: Iterative Fusion Based Recognizer For Low Quality Scene Text
Recognition [20.741958198581173]
We propose an Iterative Fusion based Recognizer (IFR) for low quality scene text recognition.
IFR contains two branches which focus on scene text recognition and low quality scene text image recovery respectively.
A feature fusion module is proposed to strengthen the feature representation of the two branches.
arXiv Detail & Related papers (2021-08-13T10:45:01Z) - Exploiting Raw Images for Real-Scene Super-Resolution [105.18021110372133]
We study the problem of real-scene single image super-resolution to bridge the gap between synthetic data and real captured images.
We propose a method to generate more realistic training data by mimicking the imaging process of digital cameras.
We also develop a two-branch convolutional neural network to exploit the radiance information originally-recorded in raw images.
arXiv Detail & Related papers (2021-02-02T16:10:15Z) - Real-time Non-line-of-sight Imaging with Two-step Deep Remapping [0.0]
Non-line-of-sight (NLOS) imaging takes the indirect light into account.
Most solutions employ a transient scanning process, followed by a back-projection based algorithm to reconstruct the NLOS scenes.
Here we propose a new NLOS solution to address the above defects, with innovations on both detection equipment and reconstruction algorithm.
arXiv Detail & Related papers (2021-01-26T00:08:54Z) - Scene Text Image Super-Resolution in the Wild [112.90416737357141]
Low-resolution text images are often seen in natural scenes such as documents captured by mobile phones.
Previous single image super-resolution (SISR) methods are trained on synthetic low-resolution images.
We pro-pose a real scene text SR dataset, termed TextZoom.
It contains paired real low-resolution and high-resolution images captured by cameras with different focal length in the wild.
arXiv Detail & Related papers (2020-05-07T09:18:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.