Related papers: TrackletMapper: Ground Surface Segmentation and Mapping from Traffic Participant Trajectories

TrackletMapper: Ground Surface Segmentation and Mapping from Traffic Participant Trajectories

URL: http://arxiv.org/abs/2209.05247v1
Date: Mon, 12 Sep 2022 13:43:10 GMT
Title: TrackletMapper: Ground Surface Segmentation and Mapping from Traffic Participant Trajectories
Authors: Jannik Z\"urn, Sebastian Weber, Wolfram Burgard
Abstract summary: TrackletMapper is a framework for annotating ground surface types such as sidewalks, roads, and street crossings from object tracklets. We show that the model can be self-distilled for additional performance benefits by aggregating a ground surface map and projecting it into the camera images. We qualitatively and quantitatively attest our findings on a novel large-scale dataset for mobile robots operating in pedestrian areas.
Score: 24.817728268091976
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Robustly classifying ground infrastructure such as roads and street crossings is an essential task for mobile robots operating alongside pedestrians. While many semantic segmentation datasets are available for autonomous vehicles, models trained on such datasets exhibit a large domain gap when deployed on robots operating in pedestrian spaces. Manually annotating images recorded from pedestrian viewpoints is both expensive and time-consuming. To overcome this challenge, we propose TrackletMapper, a framework for annotating ground surface types such as sidewalks, roads, and street crossings from object tracklets without requiring human-annotated data. To this end, we project the robot ego-trajectory and the paths of other traffic participants into the ego-view camera images, creating sparse semantic annotations for multiple types of ground surfaces from which a ground segmentation model can be trained. We further show that the model can be self-distilled for additional performance benefits by aggregating a ground surface map and projecting it into the camera images, creating a denser set of training annotations compared to the sparse tracklet annotations. We qualitatively and quantitatively attest our findings on a novel large-scale dataset for mobile robots operating in pedestrian areas. Code and dataset will be made available at http://trackletmapper.cs.uni-freiburg.de.

Related papers

Learning Semantic Traversability with Egocentric Video and Automated Annotation Strategy [3.713586225621126]
A robot must have the ability to identify semantically traversable terrains in the image based on the semantic understanding of the scene. This reasoning ability is based on semantic traversability, which is frequently achieved using semantic segmentation models fine-tuned on the testing domain. We present an effective methodology for training a semantic traversability estimator using egocentric videos and an automated annotation process.
arXiv Detail & Related papers (2024-06-05T06:40:04Z)
Pedestrian Environment Model for Automated Driving [54.16257759472116]
We propose an environment model that includes the position of the pedestrians as well as their pose information. We extract the skeletal information with a neural network human pose estimator from the image. To obtain the 3D information of the position, we aggregate the data from consecutive frames in conjunction with the vehicle position.
arXiv Detail & Related papers (2023-08-17T16:10:58Z)
Video Killed the HD-Map: Predicting Multi-Agent Behavior Directly From Aerial Images [14.689298253430568]
We propose an aerial image-based map (AIM) representation that requires minimal annotation and provides rich road context information for traffic agents like pedestrians and vehicles. Our results demonstrate competitive multi-agent trajectory prediction performance especially for pedestrians in the scene when using our AIM representation.
arXiv Detail & Related papers (2023-05-19T17:48:01Z)
APE: An Open and Shared Annotated Dataset for Learning Urban Pedestrian Path Networks [16.675093530600154]
Inferring the full transportation network, including sidewalks and cycleways, is crucial for many automated systems. This work begins to address this problem at scale by introducing a novel dataset of aerial satellite imagery, map imagery, and annotations of sidewalks, crossings, and corner bulbs in urban cities. We present an end-to-end process for inferring a connected pedestrian path network map using street network information and our proposed dataset.
arXiv Detail & Related papers (2023-03-04T05:08:36Z)
Neural Scene Representation for Locomotion on Structured Terrain [56.48607865960868]
We propose a learning-based method to reconstruct the local terrain for a mobile robot traversing urban environments. Using a stream of depth measurements from the onboard cameras and the robot's trajectory, the estimates the topography in the robot's vicinity. We propose a 3D reconstruction model that faithfully reconstructs the scene, despite the noisy measurements and large amounts of missing data coming from the blind spots of the camera arrangement.
arXiv Detail & Related papers (2022-06-16T10:45:17Z)
Cross-Camera Trajectories Help Person Retrieval in a Camera Network [124.65912458467643]
Existing methods often rely on purely visual matching or consider temporal constraints but ignore the spatial information of the camera network. We propose a pedestrian retrieval framework based on cross-camera generation, which integrates both temporal and spatial information. To verify the effectiveness of our method, we construct the first cross-camera pedestrian trajectory dataset.
arXiv Detail & Related papers (2022-04-27T13:10:48Z)
SPIN Road Mapper: Extracting Roads from Aerial Images via Spatial and Interaction Space Graph Reasoning for Autonomous Driving [64.10636296274168]
Road extraction is an essential step in building autonomous navigation systems. Using just convolution neural networks (ConvNets) for this problem is not effective as it is inefficient at capturing distant dependencies between road segments in the image. We propose a Spatial and Interaction Space Graph Reasoning (SPIN) module which when plugged into a ConvNet performs reasoning over graphs constructed on spatial and interaction spaces projected from the feature maps.
arXiv Detail & Related papers (2021-09-16T03:52:17Z)
End-to-End Deep Structured Models for Drawing Crosswalks [98.9901717499058]
We project both inputs onto the ground surface to produce a top down view of the scene. We then leverage convolutional neural networks to extract semantic cues about the location of the crosswalks. Experiments over crosswalks in a large city area show that 96.6% automation can be achieved.
arXiv Detail & Related papers (2020-12-21T18:59:08Z)
Graph-SIM: A Graph-based Spatiotemporal Interaction Modelling for Pedestrian Action Prediction [10.580548257913843]
We propose a novel graph-based model for predicting pedestrian crossing action. We introduce a new dataset that provides 3D bounding box and pedestrian behavioural annotations for the existing nuScenes dataset. Our approach achieves state-of-the-art performance by improving on various metrics by more than 15% in comparison to existing methods.
arXiv Detail & Related papers (2020-12-03T18:28:27Z)
Hidden Footprints: Learning Contextual Walkability from 3D Human Trails [70.01257397390361]
Current datasets only tell you where people are, not where they could be. We first augment the set of valid, labeled walkable regions by propagating person observations between images, utilizing 3D information to create what we call hidden footprints. We devise a training strategy designed for such sparse labels, combining a class-balanced classification loss with a contextual adversarial loss.
arXiv Detail & Related papers (2020-08-19T23:19:08Z)
Footprints and Free Space from a Single Color Image [32.57664001590537]
We introduce a model to predict the geometry of both visible and occluded traversable surfaces, given a single RGB image as input. We learn from stereo video sequences, using camera poses, per-frame depth and semantic segmentation to form training data. We find that a surprisingly low bar for spatial coverage of training scenes is required.
arXiv Detail & Related papers (2020-04-14T09:29:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.