Universal Embeddings for Spatio-Temporal Tagging of Self-Driving Logs
- URL: http://arxiv.org/abs/2011.06165v1
- Date: Thu, 12 Nov 2020 02:18:16 GMT
- Title: Universal Embeddings for Spatio-Temporal Tagging of Self-Driving Logs
- Authors: Sean Segal, Eric Kee, Wenjie Luo, Abbas Sadat, Ersin Yumer, Raquel
Urtasun
- Abstract summary: We tackle the problem of of-temporal tagging of self-driving scenes from raw sensor data.
Our approach learns a universal embedding for all tags, enabling efficient tagging of many attributes and faster learning of new attributes with limited data.
- Score: 72.67604044776662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we tackle the problem of spatio-temporal tagging of
self-driving scenes from raw sensor data. Our approach learns a universal
embedding for all tags, enabling efficient tagging of many attributes and
faster learning of new attributes with limited data. Importantly, the embedding
is spatio-temporally aware, allowing the model to naturally output
spatio-temporal tag values. Values can then be pooled over arbitrary regions,
in order to, for example, compute the pedestrian density in front of the SDV,
or determine if a car is blocking another car at a 4-way intersection. We
demonstrate the effectiveness of our approach on a new large scale self-driving
dataset, SDVScenes, containing 15 attributes relating to vehicle and pedestrian
density, the actions of each actor, the speed of each actor, interactions
between actors, and the topology of the road map.
Related papers
- Homography Guided Temporal Fusion for Road Line and Marking Segmentation [73.47092021519245]
Road lines and markings are frequently occluded in the presence of moving vehicles, shadow, and glare.
We propose a Homography Guided Fusion (HomoFusion) module to exploit temporally-adjacent video frames for complementary cues.
We show that exploiting available camera intrinsic data and ground plane assumption for cross-frame correspondence can lead to a light-weight network with significantly improved performances in speed and accuracy.
arXiv Detail & Related papers (2024-04-11T10:26:40Z) - Autosen: improving automatic wifi human sensing through cross-modal
autoencoder [56.44764266426344]
WiFi human sensing is highly regarded for its low-cost and privacy advantages in recognizing human activities.
Traditional cross-modal methods, aimed at enabling self-supervised learning without labeled data, struggle to extract meaningful features from amplitude-phase combinations.
We introduce AutoSen, an innovative automatic WiFi sensing solution that departs from conventional approaches.
arXiv Detail & Related papers (2024-01-08T19:50:02Z) - A Large-Scale Car Parts (LSCP) Dataset for Lightweight Fine-Grained
Detection [0.23020018305241333]
This paper presents a large-scale and fine-grained automotive dataset consisting of 84,162 images for detecting 12 different types of car parts.
To alleviate the burden of manual annotation, we propose a novel semi-supervised auto-labeling method.
We also study the limitations of the Grounding DINO approach for zero-shot labeling.
arXiv Detail & Related papers (2023-11-20T13:30:42Z) - Leveraging Road Area Semantic Segmentation with Auxiliary Steering Task [0.0]
We propose a CNN-based method that can leverage the steering wheel angle information to improve the road area semantic segmentation.
We demonstrate the effectiveness of the proposed approach on two challenging data sets for autonomous driving.
arXiv Detail & Related papers (2022-12-19T13:25:09Z) - Interaction Detection Between Vehicles and Vulnerable Road Users: A Deep
Generative Approach with Attention [9.442285577226606]
We propose a conditional generative model for interaction detection at intersections.
It aims to automatically analyze massive video data about the continuity of road users' behavior.
The model's efficacy was validated by testing on real-world datasets.
arXiv Detail & Related papers (2021-05-09T10:03:55Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Detecting 32 Pedestrian Attributes for Autonomous Vehicles [103.87351701138554]
In this paper, we address the problem of jointly detecting pedestrians and recognizing 32 pedestrian attributes.
We introduce a Multi-Task Learning (MTL) model relying on a composite field framework, which achieves both goals in an efficient way.
We show competitive detection and attribute recognition results, as well as a more stable MTL training.
arXiv Detail & Related papers (2020-12-04T15:10:12Z) - SoDA: Multi-Object Tracking with Soft Data Association [75.39833486073597]
Multi-object tracking (MOT) is a prerequisite for a safe deployment of self-driving cars.
We propose a novel approach to MOT that uses attention to compute track embeddings that encode dependencies between observed objects.
arXiv Detail & Related papers (2020-08-18T03:40:25Z) - The Pedestrian Patterns Dataset [11.193504036335503]
The dataset was collected by repeatedly traversing the same three routes for one week starting at different specific timeslots.
The purpose of the dataset is to capture the patterns of social and pedestrian behavior along the traversed routes at different times.
arXiv Detail & Related papers (2020-01-06T23:58:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.