Towards Robust Robot 3D Perception in Urban Environments: The UT Campus
Object Dataset
- URL: http://arxiv.org/abs/2309.13549v2
- Date: Sun, 1 Oct 2023 04:01:04 GMT
- Title: Towards Robust Robot 3D Perception in Urban Environments: The UT Campus
Object Dataset
- Authors: Arthur Zhang, Chaitanya Eranki, Christina Zhang, Ji-Hwan Park, Raymond
Hong, Pranav Kalyani, Lochana Kalyanaraman, Arsh Gamare, Arnav Bagad, Maria
Esteva, Joydeep Biswas
- Abstract summary: CODa is a mobile robot egocentric perception dataset collected on the University of Texas Austin Campus.
Our dataset contains 8.5 hours of multimodal sensor data: synchronized 3D point clouds and stereo RGB video from a 128-channel 3D LiDAR and two 1.25MP RGB cameras at 10 fps.
We provide 58 minutes of ground-truth annotations containing 1.3 million 3D bounding boxes with instance IDs for 53 semantic classes, 5000 frames of 3D semantic annotations for urban terrain.
- Score: 7.665779592030094
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce the UT Campus Object Dataset (CODa), a mobile robot egocentric
perception dataset collected on the University of Texas Austin Campus. Our
dataset contains 8.5 hours of multimodal sensor data: synchronized 3D point
clouds and stereo RGB video from a 128-channel 3D LiDAR and two 1.25MP RGB
cameras at 10 fps; RGB-D videos from an additional 0.5MP sensor at 7 fps, and a
9-DOF IMU sensor at 40 Hz. We provide 58 minutes of ground-truth annotations
containing 1.3 million 3D bounding boxes with instance IDs for 53 semantic
classes, 5000 frames of 3D semantic annotations for urban terrain, and
pseudo-ground truth localization. We repeatedly traverse identical geographic
locations for a wide range of indoor and outdoor areas, weather conditions, and
times of the day. Using CODa, we empirically demonstrate that: 1) 3D object
detection performance in urban settings is significantly higher when trained
using CODa compared to existing datasets even when employing state-of-the-art
domain adaptation approaches, 2) sensor-specific fine-tuning improves 3D object
detection accuracy and 3) pretraining on CODa improves cross-dataset 3D object
detection performance in urban settings compared to pretraining on AV datasets.
Using our dataset and annotations, we release benchmarks for 3D object
detection and 3D semantic segmentation using established metrics. In the
future, the CODa benchmark will include additional tasks like unsupervised
object discovery and re-identification. We publicly release CODa on the Texas
Data Repository, pre-trained models, dataset development package, and
interactive dataset viewer on our website at https://amrl.cs.utexas.edu/coda.
We expect CODa to be a valuable dataset for research in egocentric 3D
perception and planning for autonomous navigation in urban environments.
Related papers
- RadioDiff-3D: A 3D$\ imes$3D Radio Map Dataset and Generative Diffusion Based Benchmark for 6G Environment-Aware Communication [76.6171399066216]
UrbanRadio3D is a large-scale, high-resolution 3D RM dataset constructed via ray tracing in realistic urban environments.<n>RadioDiff-3D is a diffusion-model-based generative framework utilizing the 3D convolutional architecture.<n>This work provides a foundational dataset and benchmark for future research in 3D environment-aware communication.
arXiv Detail & Related papers (2025-07-16T11:54:08Z) - 3DGeoDet: General-purpose Geometry-aware Image-based 3D Object Detection [17.502554516157893]
3DGeoDet is a novel geometry-aware 3D object detection approach.<n>It effectively handles single- and multi-view RGB images in indoor and outdoor environments.
arXiv Detail & Related papers (2025-06-11T09:18:36Z) - UniDet3D: Multi-dataset Indoor 3D Object Detection [4.718582862677851]
ours is a simple yet effective 3D object detection model.
It is trained on a mixture of indoor datasets and is capable of working in various indoor environments.
arXiv Detail & Related papers (2024-09-06T12:40:19Z) - TAPVid-3D: A Benchmark for Tracking Any Point in 3D [63.060421798990845]
We introduce a new benchmark, TAPVid-3D, for evaluating the task of Tracking Any Point in 3D.
This benchmark will serve as a guidepost to improve our ability to understand precise 3D motion and surface deformation from monocular video.
arXiv Detail & Related papers (2024-07-08T13:28:47Z) - MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations [55.022519020409405]
This paper builds the first largest ever multi-modal 3D scene dataset and benchmark with hierarchical grounded language annotations, MMScan.
The resulting multi-modal 3D dataset encompasses 1.4M meta-annotated captions on 109k objects and 7.7k regions as well as over 3.04M diverse samples for 3D visual grounding and question-answering benchmarks.
arXiv Detail & Related papers (2024-06-13T17:59:30Z) - Uni3DETR: Unified 3D Detection Transformer [75.35012428550135]
We propose a unified 3D detector that addresses indoor and outdoor detection within the same framework.
Specifically, we employ the detection transformer with point-voxel interaction for object prediction.
We then propose the mixture of query points, which sufficiently exploits global information for dense small-range indoor scenes and local information for large-range sparse outdoor ones.
arXiv Detail & Related papers (2023-10-09T13:20:20Z) - Argoverse 2: Next Generation Datasets for Self-Driving Perception and
Forecasting [64.7364925689825]
Argoverse 2 (AV2) is a collection of three datasets for perception and forecasting research in the self-driving domain.
The Lidar dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose.
The Motion Forecasting dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene.
arXiv Detail & Related papers (2023-01-02T00:36:22Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - H3D: Benchmark on Semantic Segmentation of High-Resolution 3D Point
Clouds and textured Meshes from UAV LiDAR and Multi-View-Stereo [4.263987603222371]
This paper introduces a 3D dataset which is unique in three ways.
It depicts the village of Hessigheim (Germany) henceforth referred to as H3D.
It is designed for promoting research in the field of 3D data analysis on one hand and to evaluate and rank emerging approaches.
arXiv Detail & Related papers (2021-02-10T09:33:48Z) - Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion
Forecasting with a Single Convolutional Net [93.51773847125014]
We propose a novel deep neural network that is able to jointly reason about 3D detection, tracking and motion forecasting given data captured by a 3D sensor.
Our approach performs 3D convolutions across space and time over a bird's eye view representation of the 3D world.
arXiv Detail & Related papers (2020-12-22T22:43:35Z) - RUHSNet: 3D Object Detection Using Lidar Data in Real Time [0.0]
We propose a novel neural network architecture for detecting 3D objects in point cloud data.
Our work surpasses the state of the art in this domain both in terms of average precision and speed running at > 30 FPS.
This makes it a feasible option to be deployed in real time applications including self driving cars.
arXiv Detail & Related papers (2020-05-09T09:41:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.