Structural Information Guided Multimodal Pre-training for
Vehicle-centric Perception
- URL: http://arxiv.org/abs/2312.09812v1
- Date: Fri, 15 Dec 2023 14:10:21 GMT
- Title: Structural Information Guided Multimodal Pre-training for
Vehicle-centric Perception
- Authors: Xiao Wang, Wentao Wu, Chenglong Li, Zhicheng Zhao, Zhe Chen, Yukai
Shi, Jin Tang
- Abstract summary: We propose a novel vehicle-centric pre-training framework called VehicleMAE.
We explicitly extract the sketch lines of vehicles as a form of the spatial structure to guide vehicle reconstruction.
A large-scale dataset is built to pre-train our model, termed Autobot1M, which contains about 1M vehicle images and 12693 text information.
- Score: 36.92036421490819
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding vehicles in images is important for various applications such
as intelligent transportation and self-driving system. Existing vehicle-centric
works typically pre-train models on large-scale classification datasets and
then fine-tune them for specific downstream tasks. However, they neglect the
specific characteristics of vehicle perception in different tasks and might
thus lead to sub-optimal performance. To address this issue, we propose a novel
vehicle-centric pre-training framework called VehicleMAE, which incorporates
the structural information including the spatial structure from vehicle profile
information and the semantic structure from informative high-level natural
language descriptions for effective masked vehicle appearance reconstruction.
To be specific, we explicitly extract the sketch lines of vehicles as a form of
the spatial structure to guide vehicle reconstruction. The more comprehensive
knowledge distilled from the CLIP big model based on the similarity between the
paired/unpaired vehicle image-text sample is further taken into consideration
to help achieve a better understanding of vehicles. A large-scale dataset is
built to pre-train our model, termed Autobot1M, which contains about 1M vehicle
images and 12693 text information. Extensive experiments on four vehicle-based
downstream tasks fully validated the effectiveness of our VehicleMAE. The
source code and pre-trained models will be released at
https://github.com/Event-AHU/VehicleMAE.
Related papers
- VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models [21.186456742407007]
We propose a new vehicle detection paradigm based on a pre-trained foundation vehicle model (VehicleMAE) and a large language model (T5), termed VFM-Det.
Our model improves the baseline approach by $+5.1%$, $+6.2%$ on the $AP_0.5$, $AP_0.75$ metrics, respectively.
arXiv Detail & Related papers (2024-08-23T12:39:02Z) - Symmetric Network with Spatial Relationship Modeling for Natural
Language-based Vehicle Retrieval [3.610372087454382]
Natural language (NL) based vehicle retrieval aims to search specific vehicle given text description.
We propose a Symmetric Network with Spatial Relationship Modeling (SSM) method for NL-based vehicle retrieval.
We achieve 43.92% MRR accuracy on the test set of the 6th AI City Challenge on natural language-based vehicle retrieval track.
arXiv Detail & Related papers (2022-06-22T07:02:04Z) - CRAT-Pred: Vehicle Trajectory Prediction with Crystal Graph
Convolutional Neural Networks and Multi-Head Self-Attention [10.83642398981694]
CRAT-Pred is a trajectory prediction model that does not rely on map information.
The model achieves state-of-the-art performance with a significantly lower number of model parameters.
In addition to that, we quantitatively show that the self-attention mechanism is able to learn social interactions between vehicles, with the weights representing a measurable interaction score.
arXiv Detail & Related papers (2022-02-09T14:36:36Z) - Self-Supervised Steering Angle Prediction for Vehicle Control Using
Visual Odometry [55.11913183006984]
We show how a model can be trained to control a vehicle's trajectory using camera poses estimated through visual odometry methods.
We propose a scalable framework that leverages trajectory information from several different runs using a camera setup placed at the front of a car.
arXiv Detail & Related papers (2021-03-20T16:29:01Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Detecting 32 Pedestrian Attributes for Autonomous Vehicles [103.87351701138554]
In this paper, we address the problem of jointly detecting pedestrians and recognizing 32 pedestrian attributes.
We introduce a Multi-Task Learning (MTL) model relying on a composite field framework, which achieves both goals in an efficient way.
We show competitive detection and attribute recognition results, as well as a more stable MTL training.
arXiv Detail & Related papers (2020-12-04T15:10:12Z) - VehicleNet: Learning Robust Visual Representation for Vehicle
Re-identification [116.1587709521173]
We propose to build a large-scale vehicle dataset (called VehicleNet) by harnessing four public vehicle datasets.
We design a simple yet effective two-stage progressive approach to learning more robust visual representation from VehicleNet.
We achieve the state-of-art accuracy of 86.07% mAP on the private test set of AICity Challenge.
arXiv Detail & Related papers (2020-04-14T05:06:38Z) - The Devil is in the Details: Self-Supervised Attention for Vehicle
Re-Identification [75.3310894042132]
Self-supervised Attention for Vehicle Re-identification (SAVER) is a novel approach to effectively learn vehicle-specific discriminative features.
We show that SAVER improves upon the state-of-the-art on challenging VeRi, VehicleID, Vehicle-1M and VERI-Wild datasets.
arXiv Detail & Related papers (2020-04-14T02:24:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.