VRAE: Vertical Residual Autoencoder for License Plate Denoising and Deblurring
- URL: http://arxiv.org/abs/2509.08392v2
- Date: Thu, 11 Sep 2025 16:45:28 GMT
- Title: VRAE: Vertical Residual Autoencoder for License Plate Denoising and Deblurring
- Authors: Cuong Nguyen, Dung T. Tran, Hong Nguyen, Xuan-Vu Phan, Nam-Phong Nguyen,
- Abstract summary: Restoring degraded images a fast realtime manner is thus a crucial pre-processing step to enhance recognition performance.<n>In this work, we propose a Vertical Residual Autoencoder architecture designed for the image enhancement task in traffic surveillance.<n> Experiments on a vehicle image dataset with visible license plates demonstrate that our method consistently outperforms Autoencoder (AE), Generative Adversarial Network (GAN), and Flow-Based (FB) approaches.
- Score: 2.1639459844313564
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In real-world traffic surveillance, vehicle images captured under adverse weather, poor lighting, or high-speed motion often suffer from severe noise and blur. Such degradations significantly reduce the accuracy of license plate recognition systems, especially when the plate occupies only a small region within the full vehicle image. Restoring these degraded images a fast realtime manner is thus a crucial pre-processing step to enhance recognition performance. In this work, we propose a Vertical Residual Autoencoder (VRAE) architecture designed for the image enhancement task in traffic surveillance. The method incorporates an enhancement strategy that employs an auxiliary block, which injects input-aware features at each encoding stage to guide the representation learning process, enabling better general information preservation throughout the network compared to conventional autoencoders. Experiments on a vehicle image dataset with visible license plates demonstrate that our method consistently outperforms Autoencoder (AE), Generative Adversarial Network (GAN), and Flow-Based (FB) approaches. Compared with AE at the same depth, it improves PSNR by about 20%, reduces NMSE by around 50%, and enhances SSIM by 1%, while requiring only a marginal increase of roughly 1% in parameters.
Related papers
- PocketSR: The Super-Resolution Expert in Your Pocket Mobiles [69.26751136689533]
Real-world image super-resolution (RealSR) aims to enhance the visual quality of in-the-wild images, such as those captured by mobile phones.<n>Existing methods leveraging large generative models demonstrate impressive results, but the high computational cost and latency make them impractical for edge deployment.<n>We introduce PocketSR, an ultra-lightweight, single-step model that brings generative modeling capabilities to RealSR while maintaining high fidelity.
arXiv Detail & Related papers (2025-10-03T13:56:18Z) - A New Hybrid Model of Generative Adversarial Network and You Only Look Once Algorithm for Automatic License-Plate Recognition [1.6566053195631465]
In this paper, a selective Generative Adversarial Network (GAN) is proposed for deblurring in the preprocessing step.<n>YOLOv5 achieves a detection time of 0.026 seconds for both License-Plate Detection (LPD) and Character Recognition (CR) detection stages.<n>The proposed model achieves accuracy rates of 95% and 97% in the LPD and CR detection phases, respectively.
arXiv Detail & Related papers (2025-09-08T16:34:54Z) - Multi-Step Guided Diffusion for Image Restoration on Edge Devices: Toward Lightweight Perception in Embodied AI [0.0]
We introduce a multistep optimization strategy within each denoising timestep, significantly enhancing image quality, perceptual accuracy, and generalization.<n>Our experiments on super-resolution and Gaussian deblurring demonstrate that increasing the number of gradient updates per step improves LPIPS and PSNR with minimal latency overhead.<n>Our findings highlight MPGD's potential as a lightweight, plug-and-play restoration module for real-time visual perception in embodied AI agents such as drones and mobile robots.
arXiv Detail & Related papers (2025-06-08T21:11:25Z) - Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation [158.37640586809187]
Restoring any degraded image efficiently via just one model has become increasingly significant.<n>Our approach, termed AnyIR, takes a unified path that leverages inherent similarity across various degradations.<n>To fuse the degradation awareness and the contextualized attention, a spatial-frequency parallel fusion strategy is proposed.
arXiv Detail & Related papers (2025-04-19T09:54:46Z) - H3AE: High Compression, High Speed, and High Quality AutoEncoder for Video Diffusion Models [76.1519545010611]
Autoencoder (AE) is the key to the success of latent diffusion models for image and video generation.<n>In this work, we examine the architecture design choices and optimize the computation distribution to obtain efficient and high-compression video AEs.<n>Our AE achieves an ultra-high compression ratio and real-time decoding speed on mobile while outperforming prior art in terms of reconstruction metrics.
arXiv Detail & Related papers (2025-04-14T17:59:06Z) - Improving the Diffusability of Autoencoders [54.920783089085035]
Latent diffusion models have emerged as the leading approach for generating high-quality images and videos.<n>We perform a spectral analysis of modern autoencoders and identify inordinate high-frequency components in their latent spaces.<n>We hypothesize that this high-frequency component interferes with the coarse-to-fine nature of the diffusion synthesis process and hinders the generation quality.
arXiv Detail & Related papers (2025-02-20T18:45:44Z) - Fast-COS: A Fast One-Stage Object Detector Based on Reparameterized Attention Vision Transformer for Autonomous Driving [3.617580194719686]
This paper introduces Fast-COS, a novel single-stage object detection framework crafted specifically for driving scenes.<n> RAViT achieves 81.4% Top-1 accuracy on the ImageNet-1K dataset.<n>It surpasses leading models in efficiency, delivering up to 75.9% faster GPU inference and 1.38 higher throughput on edge devices.
arXiv Detail & Related papers (2025-02-11T09:54:09Z) - Structured Pruning for Efficient Visual Place Recognition [24.433604332415204]
Visual Place Recognition (VPR) is fundamental for the global re-localization of robots and devices.
Our work introduces a novel structured pruning method to streamline common VPR architectures.
This dual focus significantly enhances the efficiency of the system, reducing both map and model memory requirements and decreasing feature extraction and retrieval latencies.
arXiv Detail & Related papers (2024-09-12T08:32:25Z) - DGNet: Dynamic Gradient-Guided Network for Water-Related Optics Image
Enhancement [77.0360085530701]
Underwater image enhancement (UIE) is a challenging task due to the complex degradation caused by underwater environments.
Previous methods often idealize the degradation process, and neglect the impact of medium noise and object motion on the distribution of image features.
Our approach utilizes predicted images to dynamically update pseudo-labels, adding a dynamic gradient to optimize the network's gradient space.
arXiv Detail & Related papers (2023-12-12T06:07:21Z) - MOFA: A Model Simplification Roadmap for Image Restoration on Mobile
Devices [17.54747506334433]
We propose a roadmap that can be applied to further accelerate image restoration models prior to deployment.
Our approach decreases runtime by up to 13% and reduces the number of parameters by up to 23%, while increasing PSNR and SSIM.
arXiv Detail & Related papers (2023-08-24T01:29:15Z) - Vehicle Detection and Classification without Residual Calculation:
Accelerating HEVC Image Decoding with Random Perturbation Injection [0.0]
This study introduces a novel random perturbation-based compressed domain method for reconstructing images from HEVC bitstreams.
We demonstrate a significant increase in the reconstruction speed compared to the traditional full decoding approach.
We achieve a detection accuracy of 99.9%, on par with the pixel domain method, and a classification accuracy of 96.84%, only 0.98% lower than the pixel domain method.
arXiv Detail & Related papers (2023-05-14T22:04:00Z) - Skip-Attention: Improving Vision Transformers by Paying Less Attention [55.47058516775423]
Vision computation transformers (ViTs) use expensive self-attention operations in every layer.
We propose SkipAt, a method to reuse self-attention from preceding layers to approximate attention at one or more subsequent layers.
We show the effectiveness of our method in image classification and self-supervised learning on ImageNet-1K, semantic segmentation on ADE20K, image denoising on SIDD, and video denoising on DAVIS.
arXiv Detail & Related papers (2023-01-05T18:59:52Z) - Efficient Image Super-Resolution using Vast-Receptive-Field Attention [49.87316814164699]
The attention mechanism plays a pivotal role in designing advanced super-resolution (SR) networks.
In this work, we design an efficient SR network by improving the attention mechanism.
We propose VapSR, the VAst-receptive-field Pixel attention network.
arXiv Detail & Related papers (2022-10-12T07:01:00Z) - Universal and Flexible Optical Aberration Correction Using Deep-Prior
Based Deconvolution [51.274657266928315]
We propose a PSF aware plug-and-play deep network, which takes the aberrant image and PSF map as input and produces the latent high quality version via incorporating lens-specific deep priors.
Specifically, we pre-train a base model from a set of diverse lenses and then adapt it to a given lens by quickly refining the parameters.
arXiv Detail & Related papers (2021-04-07T12:00:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.