Symmetric Network with Spatial Relationship Modeling for Natural
Language-based Vehicle Retrieval
- URL: http://arxiv.org/abs/2206.10879v1
- Date: Wed, 22 Jun 2022 07:02:04 GMT
- Title: Symmetric Network with Spatial Relationship Modeling for Natural
Language-based Vehicle Retrieval
- Authors: Chuyang Zhao and Haobo Chen and Wenyuan Zhang and Junru Chen and
Sipeng Zhang and Yadong Li and Boxun Li
- Abstract summary: Natural language (NL) based vehicle retrieval aims to search specific vehicle given text description.
We propose a Symmetric Network with Spatial Relationship Modeling (SSM) method for NL-based vehicle retrieval.
We achieve 43.92% MRR accuracy on the test set of the 6th AI City Challenge on natural language-based vehicle retrieval track.
- Score: 3.610372087454382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural language (NL) based vehicle retrieval aims to search specific vehicle
given text description. Different from the image-based vehicle retrieval,
NL-based vehicle retrieval requires considering not only vehicle appearance,
but also surrounding environment and temporal relations. In this paper, we
propose a Symmetric Network with Spatial Relationship Modeling (SSM) method for
NL-based vehicle retrieval. Specifically, we design a symmetric network to
learn the unified cross-modal representations between text descriptions and
vehicle images, where vehicle appearance details and vehicle trajectory global
information are preserved. Besides, to make better use of location information,
we propose a spatial relationship modeling methods to take surrounding
environment and mutual relationship between vehicles into consideration. The
qualitative and quantitative experiments verify the effectiveness of the
proposed method. We achieve 43.92% MRR accuracy on the test set of the 6th AI
City Challenge on natural language-based vehicle retrieval track, yielding the
1st place among all valid submissions on the public leaderboard. The code is
available at https://github.com/hbchen121/AICITY2022_Track2_SSM.
Related papers
- Structural Information Guided Multimodal Pre-training for
Vehicle-centric Perception [36.92036421490819]
We propose a novel vehicle-centric pre-training framework called VehicleMAE.
We explicitly extract the sketch lines of vehicles as a form of the spatial structure to guide vehicle reconstruction.
A large-scale dataset is built to pre-train our model, termed Autobot1M, which contains about 1M vehicle images and 12693 text information.
arXiv Detail & Related papers (2023-12-15T14:10:21Z) - FindVehicle and VehicleFinder: A NER dataset for natural language-based
vehicle retrieval and a keyword-based cross-modal vehicle retrieval system [7.078561467480664]
Natural language (NL) based vehicle retrieval is a task aiming to retrieve a vehicle that is most consistent with a given NL query from among all candidate vehicles.
To tackle these problems and simplify, we borrow the idea from named entity recognition (NER) and construct FindVehicle, a NER dataset in the traffic domain.
VehicleFinder achieves 87.7% precision and 89.4% recall when retrieving a target vehicle by text command on our homemade dataset.
arXiv Detail & Related papers (2023-04-21T11:20:23Z) - RSG-Net: Towards Rich Sematic Relationship Prediction for Intelligent
Vehicle in Complex Environments [72.04891523115535]
We propose RSG-Net (Road Scene Graph Net): a graph convolutional network designed to predict potential semantic relationships from object proposals.
The experimental results indicate that this network, trained on Road Scene Graph dataset, could efficiently predict potential semantic relationships among objects around the ego-vehicle.
arXiv Detail & Related papers (2022-07-16T12:40:17Z) - Connecting Language and Vision for Natural Language-Based Vehicle
Retrieval [77.88818029640977]
In this paper, we apply one new modality, i.e., the language description, to search the vehicle of interest.
To connect language and vision, we propose to jointly train the state-of-the-art vision models with the transformer-based language model.
Our proposed method has achieved the 1st place on the 5th AI City Challenge, yielding competitive performance 18.69% MRR accuracy.
arXiv Detail & Related papers (2021-05-31T11:42:03Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Detecting 32 Pedestrian Attributes for Autonomous Vehicles [103.87351701138554]
In this paper, we address the problem of jointly detecting pedestrians and recognizing 32 pedestrian attributes.
We introduce a Multi-Task Learning (MTL) model relying on a composite field framework, which achieves both goals in an efficient way.
We show competitive detection and attribute recognition results, as well as a more stable MTL training.
arXiv Detail & Related papers (2020-12-04T15:10:12Z) - Commands 4 Autonomous Vehicles (C4AV) Workshop Summary [91.92872482200018]
This paper presents the results of the emphCommands for Autonomous Vehicles (C4AV) challenge based on the recent emphTalk2Car dataset.
We identify the aspects that render top-performing models successful, and relate them to existing state-of-the-art models for visual grounding.
arXiv Detail & Related papers (2020-09-18T12:33:21Z) - VehicleNet: Learning Robust Visual Representation for Vehicle
Re-identification [116.1587709521173]
We propose to build a large-scale vehicle dataset (called VehicleNet) by harnessing four public vehicle datasets.
We design a simple yet effective two-stage progressive approach to learning more robust visual representation from VehicleNet.
We achieve the state-of-art accuracy of 86.07% mAP on the private test set of AICity Challenge.
arXiv Detail & Related papers (2020-04-14T05:06:38Z) - A Multi-Modal States based Vehicle Descriptor and Dilated Convolutional
Social Pooling for Vehicle Trajectory Prediction [3.131740922192114]
We propose a vehicle-descriptor based LSTM model with the dilated convolutional social pooling (VD+DCS-LSTM) to cope with the above issues.
Each vehicle's multi-modal state information is employed as our model's input.
The validity of the overall model was verified over the NGSIM US-101 and I-80 datasets.
arXiv Detail & Related papers (2020-03-07T01:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.