Related papers: A Training-Free Framework for Video License Plate Tracking and Recognition with Only One-Shot

A Training-Free Framework for Video License Plate Tracking and Recognition with Only One-Shot

URL: http://arxiv.org/abs/2408.05729v1
Date: Sun, 11 Aug 2024 08:42:02 GMT
Title: A Training-Free Framework for Video License Plate Tracking and Recognition with Only One-Shot
Authors: Haoxuan Ding, Qi Wang, Junyu Gao, Qiang Li,
Abstract summary: OneShotLP is a training-free framework for video-based license plate detection and recognition. It offers the ability to function effectively without extensive training data and adaptability to various license plate styles. This highlights the potential of leveraging pre-trained models for diverse real-world applications in intelligent transportation systems.
Score: 25.032455444204466
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Traditional license plate detection and recognition models are often trained on closed datasets, limiting their ability to handle the diverse license plate formats across different regions. The emergence of large-scale pre-trained models has shown exceptional generalization capabilities, enabling few-shot and zero-shot learning. We propose OneShotLP, a training-free framework for video-based license plate detection and recognition, leveraging these advanced models. Starting with the license plate position in the first video frame, our method tracks this position across subsequent frames using a point tracking module, creating a trajectory of prompts. These prompts are input into a segmentation module that uses a promptable large segmentation model to generate local masks of the license plate regions. The segmented areas are then processed by multimodal large language models (MLLMs) for accurate license plate recognition. OneShotLP offers significant advantages, including the ability to function effectively without extensive training data and adaptability to various license plate styles. Experimental results on UFPR-ALPR and SSIG-SegPlate datasets demonstrate the superior accuracy of our approach compared to traditional methods. This highlights the potential of leveraging pre-trained models for diverse real-world applications in intelligent transportation systems. The code is available at https://github.com/Dinghaoxuan/OneShotLP.

Related papers

Efficient Video-Based ALPR System Using YOLO and Visual Rhythm [0.36832029288386137]
We propose a system capable of extracting exactly one frame per vehicle and recognizing its license plate characters from this singular image. Early experiments show that this methodology is viable.
arXiv Detail & Related papers (2025-01-04T12:15:58Z)
A Dataset and Model for Realistic License Plate Deblurring [17.52035404373648]
We introduce the first large-scale license plate deblurring dataset named License Plate Blur (LPBlur) Then, we propose a License Plate Deblurring Generative Adversarial Network (LPDGAN) to tackle the license plate deblurring. Our proposed model outperforms other state-of-the-art motion deblurring methods in realistic license plate deblurring scenarios.
arXiv Detail & Related papers (2024-04-21T14:36:57Z)
PlateSegFL: A Privacy-Preserving License Plate Detection Using Federated Segmentation Learning [0.0]
PlateSegFL implements U-Net-based segmentation along with Federated Learning (FL) Different computing platforms, such as mobile phones, are able to collaborate on the development of a standard prediction model.
arXiv Detail & Related papers (2024-04-07T19:10:02Z)
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs [55.8550939439138]
Vision-Language Models (VLMs) have shown immense potential by integrating large language models with vision systems. These models face challenges in the fundamental computer vision task of object localisation, due to their training on multimodal data containing mostly captions. We introduce an input-agnostic Positional Insert (PIN), a learnable spatial prompt, containing a minimal set of parameters that are slid inside the frozen VLM. Our PIN module is trained with a simple next-token prediction task on synthetic data without requiring the introduction of new output heads.
arXiv Detail & Related papers (2024-02-13T18:39:18Z)
Localized Symbolic Knowledge Distillation for Visual Commonsense Models [150.18129140140238]
We build Localized Visual Commonsense models, which allow users to specify (multiple) regions as input. We train our model by sampling localized commonsense knowledge from a large language model. We find that training on the localized commonsense corpus can successfully distill existing vision-language models to support a reference-as-input interface.
arXiv Detail & Related papers (2023-12-08T05:23:50Z)
Frozen CLIP Models are Efficient Video Learners [86.73871814176795]
Video recognition has been dominated by the end-to-end learning paradigm. Recent advances in Contrastive Vision-Language Pre-training pave the way for a new route for visual recognition tasks. We present Efficient Video Learning -- an efficient framework for directly training high-quality video recognition models.
arXiv Detail & Related papers (2022-08-06T17:38:25Z)
An advanced combination of semi-supervised Normalizing Flow & Yolo (YoloNF) to detect and recognize vehicle license plates [1.5208105446192792]
This paper presents a robust and efficient ALPR system based on the state-of-the-art YOLO object detector and Normalizing flows. The model uses two new strategies. Firstly, a two-stage network using YOLO and a normalization flow-based model for normalization to detect Licenses Plates (LP) and recognize the LP with numbers and Arabic characters.
arXiv Detail & Related papers (2022-07-21T22:22:57Z)
Multi-Modal Zero-Shot Sign Language Recognition [51.07720650677784]
We propose a multi-modal Zero-Shot Sign Language Recognition model. A Transformer-based model along with a C3D model is used for hand detection and deep features extraction. A semantic space is used to map the visual features to the lingual embedding of the class labels.
arXiv Detail & Related papers (2021-09-02T09:10:39Z)
Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts [89.06560404218028]
We introduce a new method for pre-training video action recognition models using queried web videos. Instead of trying to filter out, we propose to convert the potential noises in these queried videos to useful supervision signals. We show that SPL outperforms several existing pre-training strategies using pseudo-labels.
arXiv Detail & Related papers (2021-01-11T05:50:16Z)
A Robust Attentional Framework for License Plate Recognition in the Wild [95.7296788722492]
We propose a robust framework for license plate recognition in the wild. It is composed of a tailored CycleGAN model for license plate image generation and an elaborate designed image-to-sequence network for plate recognition. We release a new license plate dataset, named "CLPD", with 1200 images from all 31 provinces in mainland China.
arXiv Detail & Related papers (2020-06-06T17:11:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.