Commands 4 Autonomous Vehicles (C4AV) Workshop Summary
- URL: http://arxiv.org/abs/2009.08792v1
- Date: Fri, 18 Sep 2020 12:33:21 GMT
- Title: Commands 4 Autonomous Vehicles (C4AV) Workshop Summary
- Authors: Thierry Deruyttere, Simon Vandenhende, Dusan Grujicic, Yu Liu, Luc Van
Gool, Matthew Blaschko, Tinne Tuytelaars, Marie-Francine Moens
- Abstract summary: This paper presents the results of the emphCommands for Autonomous Vehicles (C4AV) challenge based on the recent emphTalk2Car dataset.
We identify the aspects that render top-performing models successful, and relate them to existing state-of-the-art models for visual grounding.
- Score: 91.92872482200018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The task of visual grounding requires locating the most relevant region or
object in an image, given a natural language query. So far, progress on this
task was mostly measured on curated datasets, which are not always
representative of human spoken language. In this work, we deviate from recent,
popular task settings and consider the problem under an autonomous vehicle
scenario. In particular, we consider a situation where passengers can give
free-form natural language commands to a vehicle which can be associated with
an object in the street scene. To stimulate research on this topic, we have
organized the \emph{Commands for Autonomous Vehicles} (C4AV) challenge based on
the recent \emph{Talk2Car} dataset (URL:
https://www.aicrowd.com/challenges/eccv-2020-commands-4-autonomous-vehicles).
This paper presents the results of the challenge. First, we compare the used
benchmark against existing datasets for visual grounding. Second, we identify
the aspects that render top-performing models successful, and relate them to
existing state-of-the-art models for visual grounding, in addition to detecting
potential failure cases by evaluating on carefully selected subsets. Finally,
we discuss several possibilities for future work.
Related papers
- Learning autonomous driving from aerial imagery [67.06858775696453]
Photogrammetric simulators allow the synthesis of novel views through the transformation of pre-generated assets into novel views.
We use a Neural Radiance Field (NeRF) as an intermediate representation to synthesize novel views from the point of view of a ground vehicle.
arXiv Detail & Related papers (2024-10-18T05:09:07Z) - Learning Road Scene-level Representations via Semantic Region Prediction [11.518756759576657]
We tackle two vital tasks in automated driving systems, i.e., driver intent prediction and risk object identification from egocentric images.
We contend that a scene-level representation must capture higher-level semantic and geometric representations of traffic scenes around ego-vehicle.
We propose to learn scene-level representations via a novel semantic region prediction task and an automatic semantic region labeling algorithm.
arXiv Detail & Related papers (2023-01-02T15:13:30Z) - Vision-Guided Forecasting -- Visual Context for Multi-Horizon Time
Series Forecasting [0.6947442090579469]
We tackle multi-horizon forecasting of vehicle states by fusing the two modalities.
We design and experiment with 3D convolutions for visual features extraction and 1D convolutions for features extraction from speed and steering angle traces.
We show that we are able to forecast a vehicle's state to various horizons, while outperforming the current state-of-the-art results on the related task of driving state estimation.
arXiv Detail & Related papers (2021-07-27T08:52:40Z) - Connecting Language and Vision for Natural Language-Based Vehicle
Retrieval [77.88818029640977]
In this paper, we apply one new modality, i.e., the language description, to search the vehicle of interest.
To connect language and vision, we propose to jointly train the state-of-the-art vision models with the transformer-based language model.
Our proposed method has achieved the 1st place on the 5th AI City Challenge, yielding competitive performance 18.69% MRR accuracy.
arXiv Detail & Related papers (2021-05-31T11:42:03Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Batch Exploration with Examples for Scalable Robotic Reinforcement
Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states.
BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z) - A Baseline for the Commands For Autonomous Vehicles Challenge [7.430057056425165]
The challenge is based on the recent textttTalk2Car dataset.
This document provides a technical overview of a model that we released to help participants get started in the competition.
arXiv Detail & Related papers (2020-04-20T13:35:47Z) - VehicleNet: Learning Robust Visual Representation for Vehicle
Re-identification [116.1587709521173]
We propose to build a large-scale vehicle dataset (called VehicleNet) by harnessing four public vehicle datasets.
We design a simple yet effective two-stage progressive approach to learning more robust visual representation from VehicleNet.
We achieve the state-of-art accuracy of 86.07% mAP on the private test set of AICity Challenge.
arXiv Detail & Related papers (2020-04-14T05:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.