Related papers: Deep Learning Perspective of Scene Understanding in Autonomous Robots

Deep Learning Perspective of Scene Understanding in Autonomous Robots

URL: http://arxiv.org/abs/2512.14020v1
Date: Tue, 16 Dec 2025 02:31:54 GMT
Title: Deep Learning Perspective of Scene Understanding in Autonomous Robots
Authors: Afia Maham, Dur E Nayab Tashfa,
Abstract summary: This paper provides a review of deep learning applications in scene understanding in autonomous robots.<n>It includes innovations in object detection, semantic and instance segmentation, depth estimation, 3D reconstruction, and visual SLAM.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper provides a review of deep learning applications in scene understanding in autonomous robots, including innovations in object detection, semantic and instance segmentation, depth estimation, 3D reconstruction, and visual SLAM. It emphasizes how these techniques address limitations of traditional geometric models, improve depth perception in real time despite occlusions and textureless surfaces, and enhance semantic reasoning to understand the environment better. When these perception modules are integrated into dynamic and unstructured environments, they become more effective in decisionmaking, navigation and interaction. Lastly, the review outlines the existing problems and research directions to advance learning-based scene understanding of autonomous robots.

Related papers

Vision-Language Embodiment for Monocular Depth Estimation [11.737279515161505]
Current depth estimation models rely on inter-image relationships for supervised training.<n>We propose a method that embodies the camera model and its physical characteristics into a deep learning model.<n>The model can calculate embodied scene depth in real-time based on immediate environmental changes.
arXiv Detail & Related papers (2025-03-18T18:05:16Z)
Deep Dive into Model-free Reinforcement Learning for Biological and Robotic Systems: Theory and Practice [17.598549532513122]
We present a concise exposition of the mathematical and algorithmic aspects of model-free reinforcement learning. We use textitactor-critic methods as a tool for investigating the feedback control underlying animal and robotic behavior.
arXiv Detail & Related papers (2024-05-19T05:58:44Z)
Recent Trends in 3D Reconstruction of General Non-Rigid Scenes [104.07781871008186]
Reconstructing models of the real world, including 3D geometry, appearance, and motion of real scenes, is essential for computer graphics and computer vision. It enables the synthesizing of photorealistic novel views, useful for the movie industry and AR/VR applications. This state-of-the-art report (STAR) offers the reader a comprehensive summary of state-of-the-art techniques with monocular and multi-view inputs.
arXiv Detail & Related papers (2024-03-22T09:46:11Z)
A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented, Temporal and Depth-aware design [77.34726150561087]
We conduct a survey on the most relevant and recent advances in Deep Semantic in the context of vision for autonomous vehicles. Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective.
arXiv Detail & Related papers (2023-03-08T01:29:55Z)
Interpreting Neural Policies with Disentangled Tree Representations [58.769048492254555]
We study interpretability of compact neural policies through the lens of disentangled representation. We leverage decision trees to obtain factors of variation for disentanglement in robot learning. We introduce interpretability metrics that measure disentanglement of learned neural dynamics.
arXiv Detail & Related papers (2022-10-13T01:10:41Z)
Combining Commonsense Reasoning and Knowledge Acquisition to Guide Deep Learning in Robotics [8.566457170664926]
The architecture described in this paper draws inspiration from research in cognitive systems. Deep network models are being used for many pattern recognition and decision-making tasks in robotics and AI. Our architecture improves reliability of decision making and reduces the effort involved in training data-driven deep network models.
arXiv Detail & Related papers (2022-01-25T12:24:22Z)
3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations. A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z)
Low Dimensional State Representation Learning with Robotics Priors in Continuous Action Spaces [8.692025477306212]
Reinforcement learning algorithms have proven to be capable of solving complicated robotics tasks in an end-to-end fashion. We propose a framework combining the learning of a low-dimensional state representation, from high-dimensional observations coming from the robot's raw sensory readings, with the learning of the optimal policy.
arXiv Detail & Related papers (2021-07-04T15:42:01Z)
Cognitive architecture aided by working-memory for self-supervised multi-modal humans recognition [54.749127627191655]
The ability to recognize human partners is an important social skill to build personalized and long-term human-robot interactions. Deep learning networks have achieved state-of-the-art results and demonstrated to be suitable tools to address such a task. One solution is to make robots learn from their first-hand sensory data with self-supervision.
arXiv Detail & Related papers (2021-03-16T13:50:24Z)
Learning Depth With Very Sparse Supervision [57.911425589947314]
This paper explores the idea that perception gets coupled to 3D properties of the world via interaction with the environment. We train a specialized global-local network architecture with what would be available to a robot interacting with the environment. Experiments on several datasets show that, when ground truth is available even for just one of the image pixels, the proposed network can learn monocular dense depth estimation up to 22.5% more accurately than state-of-the-art approaches.
arXiv Detail & Related papers (2020-03-02T10:44:13Z)
Perception and Navigation in Autonomous Systems in the Era of Learning: A Survey [28.171707840152994]
This review focuses on the applications of learning-based monocular approaches in ego-motion perception, environment perception and navigation in autonomous systems. First, we delineate the shortcomings of existing classical visual simultaneous localization and mapping (vSLAM) solutions, which demonstrate the necessity to integrate deep learning techniques. Second, we review the visual-based environmental perception and understanding methods based on deep learning, including deep learning-based monocular depth estimation. Third, we focus on the visual navigation based on learning systems, mainly including reinforcement learning and deep reinforcement learning.
arXiv Detail & Related papers (2020-01-08T00:28:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.