On depth prediction for autonomous driving using self-supervised
learning
- URL: http://arxiv.org/abs/2403.06194v1
- Date: Sun, 10 Mar 2024 12:33:12 GMT
- Title: On depth prediction for autonomous driving using self-supervised
learning
- Authors: Houssem Boulahbal
- Abstract summary: This thesis focuses on the challenge of depth prediction using monocular self-supervised learning techniques.
The problem is approached from a broader perspective, exploring conditional generative adversarial networks (cGANs)
The second contribution entails a single image-to-depth self-supervised method, proposing a solution for the rigid-scene assumption.
The third significant aspect involves the introduction of a video-to-depth map forecasting approach.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Perception of the environment is a critical component for enabling autonomous
driving. It provides the vehicle with the ability to comprehend its
surroundings and make informed decisions. Depth prediction plays a pivotal role
in this process, as it helps the understanding of the geometry and motion of
the environment. This thesis focuses on the challenge of depth prediction using
monocular self-supervised learning techniques. The problem is approached from a
broader perspective first, exploring conditional generative adversarial
networks (cGANs) as a potential technique to achieve better generalization was
performed. In doing so, a fundamental contribution to the conditional GANs, the
acontrario cGAN was proposed. The second contribution entails a single
image-to-depth self-supervised method, proposing a solution for the rigid-scene
assumption using a novel transformer-based method that outputs a pose for each
dynamic object. The third significant aspect involves the introduction of a
video-to-depth map forecasting approach. This method serves as an extension of
self-supervised techniques to predict future depths. This involves the creation
of a novel transformer model capable of predicting the future depth of a given
scene. Moreover, the various limitations of the aforementioned methods were
addressed and a video-to-video depth maps model was proposed. This model
leverages the spatio-temporal consistency of the input and output sequence to
predict a more accurate depth sequence output. These methods have significant
applications in autonomous driving (AD) and advanced driver assistance systems
(ADAS).
Related papers
- Enhancing End-to-End Autonomous Driving with Latent World Model [78.22157677787239]
We propose a novel self-supervised method to enhance end-to-end driving without the need for costly labels.
Our framework textbfLAW uses a LAtent World model to predict future latent features based on the predicted ego actions and the latent feature of the current frame.
As a result, our approach achieves state-of-the-art performance in both open-loop and closed-loop benchmarks without costly annotations.
arXiv Detail & Related papers (2024-06-12T17:59:21Z) - End-to-end Autonomous Driving: Challenges and Frontiers [45.391430626264764]
We provide a comprehensive analysis of more than 270 papers, covering the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving.
We delve into several critical challenges, including multi-modality, interpretability, causal confusion, robustness, and world models, amongst others.
We discuss current advancements in foundation models and visual pre-training, as well as how to incorporate these techniques within the end-to-end driving framework.
arXiv Detail & Related papers (2023-06-29T14:17:24Z) - Policy Pre-training for End-to-end Autonomous Driving via
Self-supervised Geometric Modeling [96.31941517446859]
We propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving.
We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos.
In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input.
In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
arXiv Detail & Related papers (2023-01-03T08:52:49Z) - Exploring Contextual Representation and Multi-Modality for End-to-End
Autonomous Driving [58.879758550901364]
Recent perception systems enhance spatial understanding with sensor fusion but often lack full environmental context.
We introduce a framework that integrates three cameras to emulate the human field of view, coupled with top-down bird-eye-view semantic data to enhance contextual representation.
Our method achieves displacement error by 0.67m in open-loop settings, surpassing current methods by 6.9% on the nuScenes dataset.
arXiv Detail & Related papers (2022-10-13T05:56:20Z) - ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal
Feature Learning [132.20119288212376]
We propose a spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously.
To the best of our knowledge, we are the first to systematically investigate each part of an interpretable end-to-end vision-based autonomous driving system.
arXiv Detail & Related papers (2022-07-15T16:57:43Z) - Forecasting of depth and ego-motion with transformers and
self-supervision [0.0]
This paper addresses the problem of end-to-end self-supervised forecasting of depth and ego motion.
Given a sequence of raw images, the aim is to forecast both the geometry and ego-motion using a supervised self photometric loss.
The architecture is designed using both convolution and transformer modules.
arXiv Detail & Related papers (2022-06-15T10:14:11Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Self-Supervision by Prediction for Object Discovery in Videos [62.87145010885044]
In this paper, we use the prediction task as self-supervision and build a novel object-centric model for image sequence representation.
Our framework can be trained without the help of any manual annotation or pretrained network.
Initial experiments confirm that the proposed pipeline is a promising step towards object-centric video prediction.
arXiv Detail & Related papers (2021-03-09T19:14:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.