Language Prompt for Autonomous Driving
- URL: http://arxiv.org/abs/2309.04379v1
- Date: Fri, 8 Sep 2023 15:21:07 GMT
- Title: Language Prompt for Autonomous Driving
- Authors: Dongming Wu, Wencheng Han, Tiancai Wang, Yingfei Liu, Xiangyu Zhang,
Jianbing Shen
- Abstract summary: We propose the first object-centric language prompt set for driving scenes within 3D, multi-view, and multi-frame space, named NuPrompt.
It expands Nuscenes dataset by constructing a total of 35,367 language descriptions, each referring to an average of 5.3 object tracks.
Based on the object-text pairs from the new benchmark, we formulate a new prompt-based driving task, ie, employing a language prompt to predict the described object trajectory across views and frames.
- Score: 58.45334918772529
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A new trend in the computer vision community is to capture objects of
interest following flexible human command represented by a natural language
prompt. However, the progress of using language prompts in driving scenarios is
stuck in a bottleneck due to the scarcity of paired prompt-instance data. To
address this challenge, we propose the first object-centric language prompt set
for driving scenes within 3D, multi-view, and multi-frame space, named
NuPrompt. It expands Nuscenes dataset by constructing a total of 35,367
language descriptions, each referring to an average of 5.3 object tracks. Based
on the object-text pairs from the new benchmark, we formulate a new
prompt-based driving task, \ie, employing a language prompt to predict the
described object trajectory across views and frames. Furthermore, we provide a
simple end-to-end baseline model based on Transformer, named PromptTrack.
Experiments show that our PromptTrack achieves impressive performance on
NuPrompt. We hope this work can provide more new insights for the autonomous
driving community. Dataset and Code will be made public at
\href{https://github.com/wudongming97/Prompt4Driving}{https://github.com/wudongming97/Prompt4Driving}.
Related papers
- Contextual Object Detection with Multimodal Large Language Models [66.15566719178327]
We introduce a novel research problem of contextual object detection.
Three representative scenarios are investigated, including the language cloze test, visual captioning, and question answering.
We present ContextDET, a unified multimodal model that is capable of end-to-end differentiable modeling of visual-language contexts.
arXiv Detail & Related papers (2023-05-29T17:50:33Z) - Type-to-Track: Retrieve Any Object via Prompt-based Tracking [34.859061177766016]
This paper introduces a novel paradigm for Multiple Object Tracking called Type-to-Track.
Type-to-Track allows users to track objects in videos by typing natural language descriptions.
We present a new dataset for that Grounded Multiple Object Tracking task, called GroOT.
arXiv Detail & Related papers (2023-05-22T21:25:27Z) - Explaining Patterns in Data with Language Models via Interpretable
Autoprompting [143.4162028260874]
We introduce interpretable autoprompting (iPrompt), an algorithm that generates a natural-language string explaining the data.
iPrompt can yield meaningful insights by accurately finding groundtruth dataset descriptions.
Experiments with an fMRI dataset show the potential for iPrompt to aid in scientific discovery.
arXiv Detail & Related papers (2022-10-04T18:32:14Z) - PromptSource: An Integrated Development Environment and Repository for
Natural Language Prompts [106.82620362222197]
PromptSource is a system for creating, sharing, and using natural language prompts.
Prompts are functions that map an example from a dataset to a natural language input and target output.
Over 2,000 prompts for roughly 170 datasets are already available in PromptSource.
arXiv Detail & Related papers (2022-02-02T20:48:54Z) - All You Can Embed: Natural Language based Vehicle Retrieval with
Spatio-Temporal Transformers [0.981213663876059]
We present All You Can Embed (AYCE), a modular solution to correlate single-vehicle tracking sequences with natural language.
The main building blocks of the proposed architecture are (i) BERT to provide an embedding of the textual descriptions, (ii) a convolutional backbone along with a Transformer model to embed the visual information.
For the training of the retrieval model, a variation of the Triplet Margin Loss is proposed to learn a distance measure between the visual and language embeddings.
arXiv Detail & Related papers (2021-06-18T14:38:51Z) - Connecting Language and Vision for Natural Language-Based Vehicle
Retrieval [77.88818029640977]
In this paper, we apply one new modality, i.e., the language description, to search the vehicle of interest.
To connect language and vision, we propose to jointly train the state-of-the-art vision models with the transformer-based language model.
Our proposed method has achieved the 1st place on the 5th AI City Challenge, yielding competitive performance 18.69% MRR accuracy.
arXiv Detail & Related papers (2021-05-31T11:42:03Z) - SBNet: Segmentation-based Network for Natural Language-based Vehicle
Search [8.286899656309476]
Natural language-based vehicle retrieval is a task to find a target vehicle within a given image based on a natural language description as a query.
This technology can be applied to various areas including police searching for a suspect vehicle.
We propose a deep neural network called SBNet that performs natural language-based segmentation for vehicle retrieval.
arXiv Detail & Related papers (2021-04-22T08:06:17Z) - Commands 4 Autonomous Vehicles (C4AV) Workshop Summary [91.92872482200018]
This paper presents the results of the emphCommands for Autonomous Vehicles (C4AV) challenge based on the recent emphTalk2Car dataset.
We identify the aspects that render top-performing models successful, and relate them to existing state-of-the-art models for visual grounding.
arXiv Detail & Related papers (2020-09-18T12:33:21Z) - A Baseline for the Commands For Autonomous Vehicles Challenge [7.430057056425165]
The challenge is based on the recent textttTalk2Car dataset.
This document provides a technical overview of a model that we released to help participants get started in the competition.
arXiv Detail & Related papers (2020-04-20T13:35:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.