CLIPVehicle: A Unified Framework for Vision-based Vehicle Search
- URL: http://arxiv.org/abs/2508.04120v1
- Date: Wed, 06 Aug 2025 06:36:44 GMT
- Title: CLIPVehicle: A Unified Framework for Vision-based Vehicle Search
- Authors: Likai Wang, Ruize Han, Xiangqun Zhang, Wei Feng,
- Abstract summary: We propose a new unified framework, namely CLIPVehicle, which contains a dual-granularity semantic-region alignment module.<n>We also construct a new benchmark, including a real-world dataset CityFlowVS, and two synthetic datasets SynVS-Day and SynVS-All, for vehicle search.
- Score: 13.316099306091239
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vehicles, as one of the most common and significant objects in the real world, the researches on which using computer vision technologies have made remarkable progress, such as vehicle detection, vehicle re-identification, etc. To search an interested vehicle from the surveillance videos, existing methods first pre-detect and store all vehicle patches, and then apply vehicle re-identification models, which is resource-intensive and not very practical. In this work, we aim to achieve the joint detection and re-identification for vehicle search. However, the conflicting objectives between detection that focuses on shared vehicle commonness and re-identification that focuses on individual vehicle uniqueness make it challenging for a model to learn in an end-to-end system. For this problem, we propose a new unified framework, namely CLIPVehicle, which contains a dual-granularity semantic-region alignment module to leverage the VLMs (Vision-Language Models) for vehicle discrimination modeling, and a multi-level vehicle identification learning strategy to learn the identity representation from global, instance and feature levels. We also construct a new benchmark, including a real-world dataset CityFlowVS, and two synthetic datasets SynVS-Day and SynVS-All, for vehicle search. Extensive experimental results demonstrate that our method outperforms the state-of-the-art methods of both vehicle Re-ID and person search tasks.
Related papers
- VehicleGAN: Pair-flexible Pose Guided Image Synthesis for Vehicle Re-identification [27.075761782915496]
This paper proposes to synthesize a large number of vehicle images in the target pose.
Considering the paired data of the same vehicles in different traffic surveillance cameras might be not available in the real world, we propose VehicleGAN.
Because of the feature distribution difference between real and synthetic data, we propose a new Joint Metric Learning (JML) via effective feature-level fusion.
arXiv Detail & Related papers (2023-11-27T19:34:04Z) - Multi-query Vehicle Re-identification: Viewpoint-conditioned Network,
Unified Dataset and New Metric [30.344288906037345]
We propose a more realistic and easily accessible task, called multi-query vehicle Re-ID.
We design a novel viewpoint-conditioned network (VCNet), which adaptively combines the complementary information from different vehicle viewpoints.
Second, we create a unified benchmark dataset, taken by 6142 cameras from a real-life transportation surveillance system.
Third, we design a new evaluation metric, called mean cross-scene precision (mCSP), which measures the ability of cross-scene recognition.
arXiv Detail & Related papers (2023-05-25T06:22:03Z) - Discriminative-Region Attention and Orthogonal-View Generation Model for
Vehicle Re-Identification [7.5366501970852955]
Multiple challenges hamper the applications of vision-based vehicle Re-ID methods.
The proposed DRA model can automatically extract the discriminative region features, which can distinguish similar vehicles.
And the OVG model can generate multi-view features based on the input view features to reduce the impact of viewpoint mismatches.
arXiv Detail & Related papers (2022-04-28T07:46:03Z) - Connecting Language and Vision for Natural Language-Based Vehicle
Retrieval [77.88818029640977]
In this paper, we apply one new modality, i.e., the language description, to search the vehicle of interest.
To connect language and vision, we propose to jointly train the state-of-the-art vision models with the transformer-based language model.
Our proposed method has achieved the 1st place on the 5th AI City Challenge, yielding competitive performance 18.69% MRR accuracy.
arXiv Detail & Related papers (2021-05-31T11:42:03Z) - Pluggable Weakly-Supervised Cross-View Learning for Accurate Vehicle
Re-Identification [53.6218051770131]
Cross-view consistent feature representation is key for accurate vehicle ReID.
Existing approaches resort to supervised cross-view learning using extensive extra viewpoints annotations.
We present a pluggable Weakly-supervised Cross-View Learning (WCVL) module for vehicle ReID.
arXiv Detail & Related papers (2021-03-09T11:51:09Z) - Trends in Vehicle Re-identification Past, Present, and Future: A
Comprehensive Review [2.9093633827040724]
Vehicle re-id matches targeted vehicle over-overlapping views in multiple camera network views.
This paper gives a comprehensive description of the various vehicle re-id technologies, methods, datasets, and a comparison of different methodologies.
arXiv Detail & Related papers (2021-02-19T05:02:24Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - VehicleNet: Learning Robust Visual Representation for Vehicle
Re-identification [116.1587709521173]
We propose to build a large-scale vehicle dataset (called VehicleNet) by harnessing four public vehicle datasets.
We design a simple yet effective two-stage progressive approach to learning more robust visual representation from VehicleNet.
We achieve the state-of-art accuracy of 86.07% mAP on the private test set of AICity Challenge.
arXiv Detail & Related papers (2020-04-14T05:06:38Z) - The Devil is in the Details: Self-Supervised Attention for Vehicle
Re-Identification [75.3310894042132]
Self-supervised Attention for Vehicle Re-identification (SAVER) is a novel approach to effectively learn vehicle-specific discriminative features.
We show that SAVER improves upon the state-of-the-art on challenging VeRi, VehicleID, Vehicle-1M and VERI-Wild datasets.
arXiv Detail & Related papers (2020-04-14T02:24:47Z) - Parsing-based View-aware Embedding Network for Vehicle Re-Identification [138.11983486734576]
We propose a parsing-based view-aware embedding network (PVEN) to achieve the view-aware feature alignment and enhancement for vehicle ReID.
The experiments conducted on three datasets show that our model outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2020-04-10T13:06:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.