DualAD: Disentangling the Dynamic and Static World for End-to-End Driving
- URL: http://arxiv.org/abs/2406.06264v1
- Date: Mon, 10 Jun 2024 13:46:07 GMT
- Title: DualAD: Disentangling the Dynamic and Static World for End-to-End Driving
- Authors: Simon Doll, Niklas Hanselmann, Lukas Schneider, Richard Schulz, Marius Cordts, Markus Enzweiler, Hendrik P. A. Lensch,
- Abstract summary: State-of-the-art approaches for autonomous driving integrate multiple sub-tasks of the overall driving task into a single pipeline.
We propose dedicated representations to disentangle dynamic agents and static scene elements.
Our method titled DualAD outperforms independently trained single-task networks.
- Score: 11.379456277711379
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art approaches for autonomous driving integrate multiple sub-tasks of the overall driving task into a single pipeline that can be trained in an end-to-end fashion by passing latent representations between the different modules. In contrast to previous approaches that rely on a unified grid to represent the belief state of the scene, we propose dedicated representations to disentangle dynamic agents and static scene elements. This allows us to explicitly compensate for the effect of both ego and object motion between consecutive time steps and to flexibly propagate the belief state through time. Furthermore, dynamic objects can not only attend to the input camera images, but also directly benefit from the inferred static scene structure via a novel dynamic-static cross-attention. Extensive experiments on the challenging nuScenes benchmark demonstrate the benefits of the proposed dual-stream design, especially for modelling highly dynamic agents in the scene, and highlight the improved temporal consistency of our approach. Our method titled DualAD not only outperforms independently trained single-task networks, but also improves over previous state-of-the-art end-to-end models by a large margin on all tasks along the functional chain of driving.
Related papers
- DeMo: Decoupling Motion Forecasting into Directional Intentions and Dynamic States [6.856351850183536]
We introduce DeMo, a framework that decouples multi-modal trajectory queries into two types.
By leveraging this format, we separately optimize the multi-modality and dynamic evolutionary properties of trajectories.
We additionally introduce combined Attention and Mamba techniques for global information aggregation and state sequence modeling.
arXiv Detail & Related papers (2024-10-08T12:27:49Z) - DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Autonomous Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.
Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.
Experiments conducted on nuScenes dataset demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z) - PPEA-Depth: Progressive Parameter-Efficient Adaptation for
Self-Supervised Monocular Depth Estimation [24.68378829544394]
We propose PPEA-Depth, a Progressive Efficient Adaptation approach to transfer a pre-trained image model for self-supervised depth estimation.
The training comprises two sequential stages: an initial phase trained on a dataset primarily composed of static scenes, succeeded by an expansion to more intricate datasets.
Experiments demonstrate that PPEA-Depth achieves state-of-the-art performance on KITTI, CityScapes and DDAD datasets.
arXiv Detail & Related papers (2023-12-20T14:45:57Z) - EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via
Self-Supervision [85.17951804790515]
EmerNeRF is a simple yet powerful approach for learning spatial-temporal representations of dynamic driving scenes.
It simultaneously captures scene geometry, appearance, motion, and semantics via self-bootstrapping.
Our method achieves state-of-the-art performance in sensor simulation.
arXiv Detail & Related papers (2023-11-03T17:59:55Z) - Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream.
At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank.
To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z) - DytanVO: Joint Refinement of Visual Odometry and Motion Segmentation in
Dynamic Environments [6.5121327691369615]
We present DytanVO, the first supervised learning-based VO method that deals with dynamic environments.
Our method achieves an average improvement of 27.7% in ATE over state-of-the-art VO solutions in real-world dynamic environments.
arXiv Detail & Related papers (2022-09-17T23:56:03Z) - Drivable Volumetric Avatars using Texel-Aligned Features [52.89305658071045]
Photo telepresence requires both high-fidelity body modeling and faithful driving to enable dynamically synthesized appearance.
We propose an end-to-end framework that addresses two core challenges in modeling and driving full-body avatars of real people.
arXiv Detail & Related papers (2022-07-20T09:28:16Z) - STVGFormer: Spatio-Temporal Video Grounding with Static-Dynamic
Cross-Modal Understanding [68.96574451918458]
We propose a framework named STVG, which models visual-linguistic dependencies with a static branch and a dynamic branch.
Both the static and dynamic branches are designed as cross-modal transformers.
Our proposed method achieved 39.6% vIoU and won the first place in the HC-STVG of the Person in Context Challenge.
arXiv Detail & Related papers (2022-07-06T15:48:58Z) - Isolating and Leveraging Controllable and Noncontrollable Visual
Dynamics in World Models [65.97707691164558]
We present Iso-Dream, which improves the Dream-to-Control framework in two aspects.
First, by optimizing inverse dynamics, we encourage world model to learn controllable and noncontrollable sources.
Second, we optimize the behavior of the agent on the decoupled latent imaginations of the world model.
arXiv Detail & Related papers (2022-05-27T08:07:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.