Related papers: CharDiff: A Diffusion Model with Character-Level Guidance for License Plate Image Restoration

CharDiff: A Diffusion Model with Character-Level Guidance for License Plate Image Restoration

URL: http://arxiv.org/abs/2510.17330v1
Date: Mon, 20 Oct 2025 09:23:29 GMT
Title: CharDiff: A Diffusion Model with Character-Level Guidance for License Plate Image Restoration
Authors: Gyuhwan Park, Kihyun Na, Injung Kim,
Abstract summary: CharDiff is a novel diffusion-based framework with character-level guidance to restore license plate images.<n>CharDiff significantly outperformed the baseline restoration models in both restoration quality and recognition accuracy.
Score: 1.2483260526189446
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The significance of license plate image restoration goes beyond the preprocessing stage of License Plate Recognition (LPR) systems, as it also serves various purposes, including increasing evidential value, enhancing the clarity of visual interface, and facilitating further utilization of license plate images. We propose a novel diffusion-based framework with character-level guidance, CharDiff, which effectively restores and recognizes severely degraded license plate images captured under realistic conditions. CharDiff leverages fine-grained character-level priors extracted through external segmentation and Optical Character Recognition (OCR) modules tailored for low-quality license plate images. For precise and focused guidance, CharDiff incorporates a novel Character-guided Attention through Region-wise Masking (CHARM) module, which ensures that each character's guidance is restricted to its own region, thereby avoiding interference with other regions. In experiments, CharDiff significantly outperformed the baseline restoration models in both restoration quality and recognition accuracy, achieving a 28% relative reduction in CER on the Roboflow-LP dataset, compared to the best-performing baseline model. These results indicate that the structured character-guided conditioning effectively enhances the robustness of diffusion-based license plate restoration and recognition in practical deployment scenarios.

Related papers

LP-LLM: End-to-End Real-World Degraded License Plate Text Recognition via Large Multimodal Models [4.497411606350301]
Real-world License Plate Recognition (LPR) faces significant challenges from severe degradations such as motion blur, low resolution, and complex illumination.<n>The prevailing "restoration-then-recognition" two-stage paradigm suffers from a fundamental flaw: the pixel-level optimization objectives of image restoration models are misaligned with the semantic goals of character recognition.<n>We propose an end-to-end structure-aware multimodal reasoning framework based on Qwen3-VL.
arXiv Detail & Related papers (2026-01-14T03:32:55Z)
Layout-Independent License Plate Recognition via Integrated Vision and Language Models [6.302166748545872]
This work presents a pattern-aware framework for automatic license plate recognition (ALPR)<n>It is designed to operate reliably across diverse plate layouts and challenging real-world conditions.<n> Experimental results demonstrate superior accuracy and robustness compared to recent segmentation-free approaches.
arXiv Detail & Related papers (2025-10-12T10:25:21Z)
MF-LPR$^2$: Multi-Frame License Plate Image Restoration and Recognition using Optical Flow [6.978269258341259]
License plate areas in dash cam images often suffer from low resolution, motion blur, and glare.<n>Existing generative models that rely on pretrained priors cannot restore reliably such poor-quality images.<n>We propose a novel multi-frame plate restoration and recognition framework, MF-LPR$2$, which addresses this issue.
arXiv Detail & Related papers (2025-08-19T07:18:14Z)
One-Step Diffusion-based Real-World Image Super-Resolution with Visual Perception Distillation [53.24542646616045]
We propose VPD-SR, a novel visual perception diffusion distillation framework specifically designed for image super-resolution (SR) generation.<n>VPD-SR consists of two components: Explicit Semantic-aware Supervision (ESS) and High-frequency Perception (HFP) loss.<n>The proposed VPD-SR achieves superior performance compared to both previous state-of-the-art methods and the teacher model with just one-step sampling.
arXiv Detail & Related papers (2025-06-03T08:28:13Z)
LAFR: Efficient Diffusion-based Blind Face Restoration via Latent Codebook Alignment Adapter [52.93785843453579]
Blind face restoration from low-quality (LQ) images is a challenging task that requires high-fidelity image reconstruction and the preservation of facial identity.<n>We propose LAFR, a novel codebook-based latent space adapter that aligns the latent distribution of LQ images with that of HQ counterparts.<n>We show that lightweight finetuning of diffusion prior on just 0.9% of FFHQ dataset is sufficient to achieve results comparable to state-of-the-art methods.
arXiv Detail & Related papers (2025-05-29T14:11:16Z)
Enhancing License Plate Super-Resolution: A Layout-Aware and Character-Driven Approach [2.9628782269544685]
We introduce a novel loss function, Layout and Character Oriented Focal Loss (LCOFL), which considers factors such as resolution, texture, and structural details, as well as the performance of the LPR task itself. We enhance character feature learning using deformable convolutions and shared weights in an attention module and employ a GAN-based training approach with an Optical Character Recognition (OCR) model as the discriminator. Our experimental results show significant improvements in character reconstruction quality, outperforming two state-of-the-art methods in both quantitative and qualitative measures.
arXiv Detail & Related papers (2024-08-27T14:40:19Z)
Preserving Full Degradation Details for Blind Image Super-Resolution [40.152015542099704]
We propose an alternative to learn degradation representations through reproducing degraded low-resolution (LR) images. By guiding the degrader to reconstruct input LR images, full degradation information can be encoded into the representations. Experiments show that our representations can extract accurate and highly robust degradation information.
arXiv Detail & Related papers (2024-07-01T13:54:59Z)
DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior [70.46245698746874]
We present DiffBIR, a general restoration pipeline that could handle different blind image restoration tasks. DiffBIR decouples blind image restoration problem into two stages: 1) degradation removal: removing image-independent content; 2) information regeneration: generating the lost image content. In the first stage, we use restoration modules to remove degradations and obtain high-fidelity restored results. For the second stage, we propose IRControlNet that leverages the generative ability of latent diffusion models to generate realistic details.
arXiv Detail & Related papers (2023-08-29T07:11:52Z)
Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network [52.77569396659629]
This paper presents the deep compensation network unfolding (DCUNet) for restoring light field (LF) images captured under low-light conditions. The framework uses the intermediate enhanced result to estimate the illumination map, which is then employed in the unfolding process to produce a new enhanced result. To properly leverage the unique characteristics of LF images, this paper proposes a pseudo-explicit feature interaction module.
arXiv Detail & Related papers (2023-08-10T07:53:06Z)
Breaking Modality Disparity: Harmonized Representation for Infrared and Visible Image Registration [66.33746403815283]
We propose a scene-adaptive infrared and visible image registration. We employ homography to simulate the deformation between different planes. We propose the first ground truth available misaligned infrared and visible image dataset.
arXiv Detail & Related papers (2023-04-12T06:49:56Z)
A Robust Attentional Framework for License Plate Recognition in the Wild [95.7296788722492]
We propose a robust framework for license plate recognition in the wild. It is composed of a tailored CycleGAN model for license plate image generation and an elaborate designed image-to-sequence network for plate recognition. We release a new license plate dataset, named "CLPD", with 1200 images from all 31 provinces in mainland China.
arXiv Detail & Related papers (2020-06-06T17:11:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.