Related papers: Grid-Centric Traffic Scenario Perception for Autonomous Driving: A Comprehensive Review

Grid-Centric Traffic Scenario Perception for Autonomous Driving: A Comprehensive Review

URL: http://arxiv.org/abs/2303.01212v2
Date: Sun, 9 Jun 2024 12:58:10 GMT
Title: Grid-Centric Traffic Scenario Perception for Autonomous Driving: A Comprehensive Review
Authors: Yining Shi, Kun Jiang, Jiusi Li, Zelin Qian, Junze Wen, Mengmeng Yang, Ke Wang, Diange Yang,
Abstract summary: Grid-centric perception is more robust to the open-world driving scenarios with endless long-tailed semantically-unknown obstacles. Recent researches demonstrate the great advantages of grid-centric perception, such as comprehensive fine-grained environmental representation. We organize previous and current knowledge of occupancy grid techniques along the main vein from 2D BEV grids to 3D occupancy to 4D occupancy forecasting.
Score: 13.047382354329736
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Grid-centric perception is a crucial field for mobile robot perception and navigation. Nonetheless, grid-centric perception is less prevalent than object-centric perception as autonomous vehicles need to accurately perceive highly dynamic, large-scale traffic scenarios and the complexity and computational costs of grid-centric perception are high. In recent years, the rapid development of deep learning techniques and hardware provides fresh insights into the evolution of grid-centric perception. The fundamental difference between grid-centric and object-centric pipeline lies in that grid-centric perception follows a geometry-first paradigm which is more robust to the open-world driving scenarios with endless long-tailed semantically-unknown obstacles. Recent researches demonstrate the great advantages of grid-centric perception, such as comprehensive fine-grained environmental representation, greater robustness to occlusion and irregular shaped objects, better ground estimation, and safer planning policies. There is also a growing trend that the capacity of occupancy networks are greatly expanded to 4D scene perception and prediction and latest techniques are highly related to new research topics such as 4D occupancy forecasting, generative AI and world models in the field of autonomous driving. Given the lack of current surveys for this rapidly expanding field, we present a hierarchically-structured review of grid-centric perception for autonomous vehicles. We organize previous and current knowledge of occupancy grid techniques along the main vein from 2D BEV grids to 3D occupancy to 4D occupancy forecasting. We additionally summarize label-efficient occupancy learning and the role of grid-centric perception in driving systems. Lastly, we present a summary of the current research trend and provide future outlooks.

Related papers

GASP: Unifying Geometric and Semantic Self-Supervised Pre-training for Autonomous Driving [12.889523014369884]
We propose a geometric and semantic self-supervised pre-training method, GASP, that learns a unified representation by predicting, at any queried future point in spacetime. By modeling geometric and semantic 4D occupancy fields instead of raw sensor measurements, the model learns a structured, general representation of the environment and its evolution through time.
arXiv Detail & Related papers (2025-03-19T20:00:27Z)
RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments [62.5830455357187]
We setup an egocentric multi-sensor data collection platform based on 3 main types of sensors (Camera, LiDAR and Fisheye) A large-scale multimodal dataset is constructed, named RoboSense, to facilitate egocentric robot perception.
arXiv Detail & Related papers (2024-08-28T03:17:40Z)
A Comprehensive Review of 3D Object Detection in Autonomous Driving: Technological Advances and Future Directions [11.071271817366739]
3D object perception has become a crucial component in the development of autonomous driving systems. This review extensively summarizes traditional 3D object detection methods, focusing on camera-based, LiDAR-based, and fusion detection techniques. We discuss future directions, including methods to improve accuracy such as temporal perception, occupancy grids, and end-to-end learning frameworks.
arXiv Detail & Related papers (2024-08-28T01:08:33Z)
A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective [20.798308029074786]
3D occupancy perception technology aims to observe and understand dense 3D environments for autonomous vehicles. Similar to traditional bird's-eye view (BEV) perception, 3D occupancy perception has the nature of multi-source input and the necessity for information fusion.
arXiv Detail & Related papers (2024-05-08T16:10:46Z)
Vision-based 3D occupancy prediction in autonomous driving: a review and outlook [19.939380586314673]
We introduce the background of vision-based 3D occupancy prediction and discuss the challenges in this task. We conduct a comprehensive survey of the progress in vision-based 3D occupancy prediction from three aspects. We present a summary of prevailing research trends and propose some inspiring future outlooks.
arXiv Detail & Related papers (2024-05-04T07:39:25Z)
3D Object Visibility Prediction in Autonomous Driving [6.802572869909114]
We present a novel attribute and its corresponding algorithm: 3D object visibility. Our proposal of this attribute and its computational strategy aims to expand the capabilities for downstream tasks.
arXiv Detail & Related papers (2024-03-06T13:07:42Z)
Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving [68.95178518732965]
A self-driving vehicle (SDV) must be able to perceive its surroundings and predict the future behavior of other traffic participants. Existing works either perform object detection followed by trajectory of the detected objects, or predict dense occupancy and flow grids for the whole scene. This motivates our unified approach to perception and future prediction that implicitly represents occupancy and flow over time with a single neural network.
arXiv Detail & Related papers (2023-08-02T23:39:24Z)
Policy Pre-training for End-to-end Autonomous Driving via Self-supervised Geometric Modeling [96.31941517446859]
We propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
arXiv Detail & Related papers (2023-01-03T08:52:49Z)
Exploring Contextual Representation and Multi-Modality for End-to-End Autonomous Driving [58.879758550901364]
Recent perception systems enhance spatial understanding with sensor fusion but often lack full environmental context. We introduce a framework that integrates three cameras to emulate the human field of view, coupled with top-down bird-eye-view semantic data to enhance contextual representation. Our method achieves displacement error by 0.67m in open-loop settings, surpassing current methods by 6.9% on the nuScenes dataset.
arXiv Detail & Related papers (2022-10-13T05:56:20Z)
Predicting Future Occupancy Grids in Dynamic Environment with Spatio-Temporal Learning [63.25627328308978]
We propose a-temporal prediction network pipeline to generate future occupancy predictions. Compared to current SOTA, our approach predicts occupancy for a longer horizon of 3 seconds. We publicly release our grid occupancy dataset based on nulis to support further research.
arXiv Detail & Related papers (2022-05-06T13:45:32Z)
Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images. Our approach is fully automatic without any human interaction. We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.