Connecting Language and Vision for Natural Language-Based Vehicle
Retrieval
- URL: http://arxiv.org/abs/2105.14897v1
- Date: Mon, 31 May 2021 11:42:03 GMT
- Title: Connecting Language and Vision for Natural Language-Based Vehicle
Retrieval
- Authors: Shuai Bai, Zhedong Zheng, Xiaohan Wang, Junyang Lin, Zhu Zhang, Chang
Zhou, Yi Yang, Hongxia Yang
- Abstract summary: In this paper, we apply one new modality, i.e., the language description, to search the vehicle of interest.
To connect language and vision, we propose to jointly train the state-of-the-art vision models with the transformer-based language model.
Our proposed method has achieved the 1st place on the 5th AI City Challenge, yielding competitive performance 18.69% MRR accuracy.
- Score: 77.88818029640977
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vehicle search is one basic task for the efficient traffic management in
terms of the AI City. Most existing practices focus on the image-based vehicle
matching, including vehicle re-identification and vehicle tracking. In this
paper, we apply one new modality, i.e., the language description, to search the
vehicle of interest and explore the potential of this task in the real-world
scenario. The natural language-based vehicle search poses one new challenge of
fine-grained understanding of both vision and language modalities. To connect
language and vision, we propose to jointly train the state-of-the-art vision
models with the transformer-based language model in an end-to-end manner.
Except for the network structure design and the training strategy, several
optimization objectives are also re-visited in this work. The qualitative and
quantitative experiments verify the effectiveness of the proposed method. Our
proposed method has achieved the 1st place on the 5th AI City Challenge,
yielding competitive performance 18.69% MRR accuracy on the private test set.
We hope this work can pave the way for the future study on using language
description effectively and efficiently for real-world vehicle retrieval
systems. The code will be available at
https://github.com/ShuaiBai623/AIC2021-T5-CLV.
Related papers
- SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving [15.551625571158056]
We propose an e2eAD method called SimpleLLM4AD.
In our method, the e2eAD task are divided into four stages, which are perception, prediction, planning, and behavior.
Our experiments demonstrate that SimpleLLM4AD achieves competitive performance in complex driving scenarios.
arXiv Detail & Related papers (2024-07-31T02:35:33Z) - LangNav: Language as a Perceptual Representation for Navigation [63.90602960822604]
We explore the use of language as a perceptual representation for vision-and-language navigation (VLN)
Our approach uses off-the-shelf vision systems for image captioning and object detection to convert an agent's egocentric panoramic view at each time step into natural language descriptions.
arXiv Detail & Related papers (2023-10-11T20:52:30Z) - ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for
Scene Text Spotting [121.11880210592497]
We argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input.
We propose an autonomous, bidirectional and iterative ABINet++ for scene text spotting.
arXiv Detail & Related papers (2022-11-19T03:50:33Z) - Symmetric Network with Spatial Relationship Modeling for Natural
Language-based Vehicle Retrieval [3.610372087454382]
Natural language (NL) based vehicle retrieval aims to search specific vehicle given text description.
We propose a Symmetric Network with Spatial Relationship Modeling (SSM) method for NL-based vehicle retrieval.
We achieve 43.92% MRR accuracy on the test set of the 6th AI City Challenge on natural language-based vehicle retrieval track.
arXiv Detail & Related papers (2022-06-22T07:02:04Z) - SBNet: Segmentation-based Network for Natural Language-based Vehicle
Search [8.286899656309476]
Natural language-based vehicle retrieval is a task to find a target vehicle within a given image based on a natural language description as a query.
This technology can be applied to various areas including police searching for a suspect vehicle.
We propose a deep neural network called SBNet that performs natural language-based segmentation for vehicle retrieval.
arXiv Detail & Related papers (2021-04-22T08:06:17Z) - Read Like Humans: Autonomous, Bidirectional and Iterative Language
Modeling for Scene Text Recognition [80.446770909975]
Linguistic knowledge is of great benefit to scene text recognition.
How to effectively model linguistic rules in end-to-end deep networks remains a research challenge.
We propose an autonomous, bidirectional and iterative ABINet for scene text recognition.
arXiv Detail & Related papers (2021-03-11T06:47:45Z) - Commands 4 Autonomous Vehicles (C4AV) Workshop Summary [91.92872482200018]
This paper presents the results of the emphCommands for Autonomous Vehicles (C4AV) challenge based on the recent emphTalk2Car dataset.
We identify the aspects that render top-performing models successful, and relate them to existing state-of-the-art models for visual grounding.
arXiv Detail & Related papers (2020-09-18T12:33:21Z) - VehicleNet: Learning Robust Visual Representation for Vehicle
Re-identification [116.1587709521173]
We propose to build a large-scale vehicle dataset (called VehicleNet) by harnessing four public vehicle datasets.
We design a simple yet effective two-stage progressive approach to learning more robust visual representation from VehicleNet.
We achieve the state-of-art accuracy of 86.07% mAP on the private test set of AICity Challenge.
arXiv Detail & Related papers (2020-04-14T05:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.