SelfD: Self-Learning Large-Scale Driving Policies From the Web
- URL: http://arxiv.org/abs/2204.10320v1
- Date: Thu, 21 Apr 2022 17:58:36 GMT
- Title: SelfD: Self-Learning Large-Scale Driving Policies From the Web
- Authors: Jimuyang Zhang and Ruizhao Zhu and Eshed Ohn-Bar
- Abstract summary: SelfD is a framework for learning scalable driving by utilizing large amounts of online monocular images.
We employ a large dataset of publicly available YouTube videos to train SelfD and comprehensively analyze its generalization benefits across challenging navigation scenarios.
- Score: 13.879536370173506
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Effectively utilizing the vast amounts of ego-centric navigation data that is
freely available on the internet can advance generalized intelligent systems,
i.e., to robustly scale across perspectives, platforms, environmental
conditions, scenarios, and geographical locations. However, it is difficult to
directly leverage such large amounts of unlabeled and highly diverse data for
complex 3D reasoning and planning tasks. Consequently, researchers have
primarily focused on its use for various auxiliary pixel- and image-level
computer vision tasks that do not consider an ultimate navigational objective.
In this work, we introduce SelfD, a framework for learning scalable driving by
utilizing large amounts of online monocular images. Our key idea is to leverage
iterative semi-supervised training when learning imitative agents from
unlabeled data. To handle unconstrained viewpoints, scenes, and camera
parameters, we train an image-based model that directly learns to plan in the
Bird's Eye View (BEV) space. Next, we use unlabeled data to augment the
decision-making knowledge and robustness of an initially trained model via
self-training. In particular, we propose a pseudo-labeling step which enables
making full use of highly diverse demonstration data through "hypothetical"
planning-based data augmentation. We employ a large dataset of publicly
available YouTube videos to train SelfD and comprehensively analyze its
generalization benefits across challenging navigation scenarios. Without
requiring any additional data collection or annotation efforts, SelfD
demonstrates consistent improvements (by up to 24%) in driving performance
evaluation on nuScenes, Argoverse, Waymo, and CARLA.
Related papers
- UdeerLID+: Integrating LiDAR, Image, and Relative Depth with Semi-Supervised [12.440461420762265]
Road segmentation is a critical task for autonomous driving systems.
Our work introduces an innovative approach that integrates LiDAR point cloud data, visual image, and relative depth maps.
One of the primary challenges is the scarcity of large-scale, accurately labeled datasets.
arXiv Detail & Related papers (2024-09-10T03:57:30Z) - Policy Pre-training for End-to-end Autonomous Driving via
Self-supervised Geometric Modeling [96.31941517446859]
We propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving.
We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos.
In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input.
In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
arXiv Detail & Related papers (2023-01-03T08:52:49Z) - Image Data Augmentation for Deep Learning: A Survey [8.817690876855728]
We systematically review different image data augmentation methods.
We propose a taxonomy of reviewed methods and present the strengths and limitations of these methods.
We also conduct extensive experiments with various data augmentation methods on three typical computer vision tasks.
arXiv Detail & Related papers (2022-04-19T02:05:56Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Geo-Context Aware Study of Vision-Based Autonomous Driving Models and
Spatial Video Data [9.883009014227815]
Vision-based deep learning (DL) methods have made great progress in learning autonomous driving models from large-scale crowd-sourced video datasets.
We develop a geo-context aware visualization system for the study of Autonomous Driving Model (ADM) predictions together with large-scale ADM video data.
arXiv Detail & Related papers (2021-08-20T17:33:54Z) - SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous
Driving [94.11868795445798]
We release a Large-Scale Object Detection benchmark for Autonomous driving, named as SODA10M, containing 10 million unlabeled images and 20K images labeled with 6 representative object categories.
To improve diversity, the images are collected every ten seconds per frame within 32 different cities under different weather conditions, periods and location scenes.
We provide extensive experiments and deep analyses of existing supervised state-of-the-art detection models, popular self-supervised and semi-supervised approaches, and some insights about how to develop future models.
arXiv Detail & Related papers (2021-06-21T13:55:57Z) - One Million Scenes for Autonomous Driving: ONCE Dataset [91.94189514073354]
We introduce the ONCE dataset for 3D object detection in the autonomous driving scenario.
The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving dataset available.
We reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.
arXiv Detail & Related papers (2021-06-21T12:28:08Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.