To New Beginnings: A Survey of Unified Perception in Autonomous Vehicle Software
- URL: http://arxiv.org/abs/2508.20892v1
- Date: Thu, 28 Aug 2025 15:20:35 GMT
- Title: To New Beginnings: A Survey of Unified Perception in Autonomous Vehicle Software
- Authors: Loïc Stratil, Felix Fent, Esteban Rivera, Markus Lienkamp,
- Abstract summary: We provide a comprehensive overview of unified perception, introducing a holistic and systemic taxonomy that categorizes methods along task integration, tracking formulation, and representation flow.<n>This work establishes the first comprehensive framework for understanding and advancing unified perception, consolidates fragmented efforts, and guides future research toward more robust, generalizable, and interpretable perception.
- Score: 9.358517073480542
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autonomous vehicle perception typically relies on modular pipelines that decompose the task into detection, tracking, and prediction. While interpretable, these pipelines suffer from error accumulation and limited inter-task synergy. Unified perception has emerged as a promising paradigm that integrates these sub-tasks within a shared architecture, potentially improving robustness, contextual reasoning, and efficiency while retaining interpretable outputs. In this survey, we provide a comprehensive overview of unified perception, introducing a holistic and systemic taxonomy that categorizes methods along task integration, tracking formulation, and representation flow. We define three paradigms -Early, Late, and Full Unified Perception- and systematically review existing methods, their architectures, training strategies, datasets used, and open-source availability, while highlighting future research directions. This work establishes the first comprehensive framework for understanding and advancing unified perception, consolidates fragmented efforts, and guides future research toward more robust, generalizable, and interpretable perception.
Related papers
- Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems [75.78934957242403]
Self-driving vehicles and drones require true Spatial Intelligence from multi-modal onboard sensor data.<n>This paper presents a framework for multi-modal pre-training, identifying the core set of techniques driving progress toward this goal.
arXiv Detail & Related papers (2025-12-30T17:58:01Z) - Explain Before You Answer: A Survey on Compositional Visual Reasoning [74.27548620675748]
Compositional visual reasoning has emerged as a key research frontier in multimodal AI.<n>This survey systematically reviews 260+ papers from top venues (CVPR, ICCV, NeurIPS, ICML, ACL, etc.)<n>We then catalog 60+ benchmarks and corresponding metrics that probe compositional visual reasoning along dimensions such as grounding accuracy, chain-of-thought faithfulness, and high-resolution perception.
arXiv Detail & Related papers (2025-08-24T11:01:51Z) - Feature-Based vs. GAN-Based Learning from Demonstrations: When and Why [50.191655141020505]
This survey provides a comparative analysis of feature-based and GAN-based approaches to learning from demonstrations.<n>We argue that the dichotomy between feature-based and GAN-based methods is increasingly nuanced.
arXiv Detail & Related papers (2025-07-08T11:45:51Z) - Efficient and Generalizable Environmental Understanding for Visual Navigation [14.10058573339022]
Visual Navigation is a core task in Embodied AI, enabling agents to navigate complex environments toward given objectives.<n>We propose Causality-Aware Navigation (CAN), which incorporates a Causal Understanding Module to enhance the agent's environmental understanding capability.
arXiv Detail & Related papers (2025-06-18T11:47:02Z) - Joint Perception and Prediction for Autonomous Driving: A Survey [1.4630192509676043]
Perception and prediction modules are critical components of autonomous driving systems.<n>Traditionally, these tasks are developed and optimized independently.<n>We propose a taxonomy that categorizes approaches based on input representation, scene context modeling, and output representation.
arXiv Detail & Related papers (2024-12-18T17:34:52Z) - Causality-Aware Transformer Networks for Robotic Navigation [13.719643934968367]
Current research in Visual Navigation reveals opportunities for improvement.
Direct adoption of RNNs and Transformers often overlooks the specific differences between Embodied AI and traditional sequential data modelling.
We propose Causality-Aware Transformer (CAT) Networks for Navigation, featuring a Causal Understanding Module.
arXiv Detail & Related papers (2024-09-04T12:53:26Z) - Active Sensing with Predictive Coding and Uncertainty Minimization [0.0]
We present an end-to-end procedure for embodied exploration inspired by two biological computations.
We first demonstrate our approach in a maze navigation task and show that it can discover the underlying transition distributions and spatial features of the environment.
We show that our model builds unsupervised representations through exploration that allow it to efficiently categorize visual scenes.
arXiv Detail & Related papers (2023-07-02T21:14:49Z) - On Neural Architecture Inductive Biases for Relational Tasks [76.18938462270503]
We introduce a simple architecture based on similarity-distribution scores which we name Compositional Network generalization (CoRelNet)
We find that simple architectural choices can outperform existing models in out-of-distribution generalizations.
arXiv Detail & Related papers (2022-06-09T16:24:01Z) - Robust Learning Through Cross-Task Consistency [92.42534246652062]
We propose a broadly applicable and fully computational method for augmenting learning with Cross-Task Consistency.
We observe that learning with cross-task consistency leads to more accurate predictions and better generalization to out-of-distribution inputs.
arXiv Detail & Related papers (2020-06-07T09:24:33Z) - Dynamic Feature Integration for Simultaneous Detection of Salient
Object, Edge and Skeleton [108.01007935498104]
In this paper, we solve three low-level pixel-wise vision problems, including salient object segmentation, edge detection, and skeleton extraction.
We first show some similarities shared by these tasks and then demonstrate how they can be leveraged for developing a unified framework.
arXiv Detail & Related papers (2020-04-18T11:10:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.