Image-free multi-character recognition
- URL:
- Date: Mon, 20 Dec 2021 15:06:49 GMT
- Title: Image-free multi-character recognition
- Authors: Huayi Wang, Chunli Zhu, Liheng Bian
- Abstract summary: We report a novel image-free sensing technique to tackle the multi-target recognition challenge for the first time.
The reported CRNN network utilities the bidirectional LSTM architecture to predict the distribution of multiple characters simultaneously.
We demonstrated the technique's effectiveness in license plate detection, which achieved 87.60% recognition accuracy at a 5% sampling rate with a higher than 100 FPS refresh rate.
- Score: 0.0
- License:
- Abstract: The recently developed image-free sensing technique maintains the advantages
of both the light hardware and software, which has been applied in simple
target classification and motion tracking. In practical applications, however,
there usually exist multiple targets in the field of view, where existing
trials fail to produce multi-semantic information. In this letter, we report a
novel image-free sensing technique to tackle the multi-target recognition
challenge for the first time. Different from the convolutional layer stack of
image-free single-pixel networks, the reported CRNN network utilities the
bidirectional LSTM architecture to predict the distribution of multiple
characters simultaneously. The framework enables to capture the long-range
dependencies, providing a high recognition accuracy of multiple characters. We
demonstrated the technique's effectiveness in license plate detection, which
achieved 87.60% recognition accuracy at a 5% sampling rate with a higher than
100 FPS refresh rate.
Related papers
- Semantics Prompting Data-Free Quantization for Low-Bit Vision Transformers [59.772673692679085]
We propose SPDFQ, a Semantics Prompting Data-Free Quantization method for ViTs.
First, SPDFQ incorporates Attention Priors Alignment (APA), which uses randomly generated attention priors to enhance the semantics of synthetic images.
Second, SPDFQ introduces Multi-Semantic Reinforcement (MSR), which utilizes localized patch optimization to prompt efficient parameterization.
arXiv Detail & Related papers (2024-12-21T09:30:45Z) - DA-HFNet: Progressive Fine-Grained Forgery Image Detection and Localization Based on Dual Attention [12.36906630199689]
We construct a DA-HFNet forged image dataset guided by text or image-assisted GAN and Diffusion model.
Our goal is to utilize a hierarchical progressive network to capture forged artifacts at different scales for detection and localization.
arXiv Detail & Related papers (2024-06-03T16:13:33Z) - Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking [55.13878429987136]
We propose a simple yet effective two-stage feature learning paradigm to jointly learn single-shot and multi-shot features for different targets.
Our method has achieved significant improvements on MOT17 and MOT20 datasets while reaching state-of-the-art performance on DanceTrack dataset.
arXiv Detail & Related papers (2023-11-17T08:17:49Z) - LCPR: A Multi-Scale Attention-Based LiDAR-Camera Fusion Network for
Place Recognition [11.206532393178385]
We present a novel neural network named LCPR for robust multimodal place recognition.
Our method can effectively utilize multi-view camera and LiDAR data to improve the place recognition performance.
arXiv Detail & Related papers (2023-11-06T15:39:48Z) - A Dual Attentive Generative Adversarial Network for Remote Sensing Image
Change Detection [6.906936669510404]
We propose a dual attentive generative adversarial network for achieving very high-resolution remote sensing image change detection tasks.
The DAGAN framework has better performance with 85.01% mean IoU and 91.48% mean F1 score than advanced methods on the LEVIR dataset.
arXiv Detail & Related papers (2023-10-03T08:26:27Z) - Multimodal Graph Learning for Deepfake Detection [10.077496841634135]
Existing deepfake detectors face several challenges in achieving robustness and generalization.
We propose a novel framework, namely Multimodal Graph Learning (MGL), that leverages information from multiple modalities.
Our proposed method aims to effectively identify and utilize distinguishing features for deepfake detection.
arXiv Detail & Related papers (2022-09-12T17:17:49Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - Image Manipulation Detection by Multi-View Multi-Scale Supervision [11.319080833880307]
Key challenge of image manipulation detection is how to learn generalizable features that are sensitive to manipulations in novel data.
In this paper we address both aspects by multi-view feature learning and multi-scale supervision.
Our thoughts are realized by a new network which we term MVSS-Net.
arXiv Detail & Related papers (2021-04-14T13:05:58Z) - D-Unet: A Dual-encoder U-Net for Image Splicing Forgery Detection and
Localization [108.8592577019391]
Image splicing forgery detection is a global binary classification task that distinguishes the tampered and non-tampered regions by image fingerprints.
We propose a novel network called dual-encoder U-Net (D-Unet) for image splicing forgery detection, which employs an unfixed encoder and a fixed encoder.
In an experimental comparison study of D-Unet and state-of-the-art methods, D-Unet outperformed the other methods in image-level and pixel-level detection.
arXiv Detail & Related papers (2020-12-03T10:54:02Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z) - MuCAN: Multi-Correspondence Aggregation Network for Video
Super-Resolution [63.02785017714131]
Video super-resolution (VSR) aims to utilize multiple low-resolution frames to generate a high-resolution prediction for each frame.
Inter- and intra-frames are the key sources for exploiting temporal and spatial information.
We build an effective multi-correspondence aggregation network (MuCAN) for VSR.
arXiv Detail & Related papers (2020-07-23T05:41:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.