Co-SemDepth: Fast Joint Semantic Segmentation and Depth Estimation on Aerial Images
- URL: http://arxiv.org/abs/2503.17982v1
- Date: Sun, 23 Mar 2025 08:25:07 GMT
- Title: Co-SemDepth: Fast Joint Semantic Segmentation and Depth Estimation on Aerial Images
- Authors: Yara AlaaEldin, Francesca Odone,
- Abstract summary: In this paper, we leverage monocular cameras on aerial robots to predict depth and semantic maps in unstructured environments.<n>We propose a joint deep-learning architecture that can perform the two tasks accurately and rapidly.
- Score: 0.9883261192383611
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Understanding the geometric and semantic properties of the scene is crucial in autonomous navigation and particularly challenging in the case of Unmanned Aerial Vehicle (UAV) navigation. Such information may be by obtained by estimating depth and semantic segmentation maps of the surrounding environment and for their practical use in autonomous navigation, the procedure must be performed as close to real-time as possible. In this paper, we leverage monocular cameras on aerial robots to predict depth and semantic maps in low-altitude unstructured environments. We propose a joint deep-learning architecture that can perform the two tasks accurately and rapidly, and validate its effectiveness on MidAir and Aeroscapes benchmark datasets. Our joint-architecture proves to be competitive or superior to the other single and joint architecture methods while performing its task fast predicting 20.2 FPS on a single NVIDIA quadro p5000 GPU and it has a low memory footprint. All codes for training and prediction can be found on this link: https://github.com/Malga-Vision/Co-SemDepth
Related papers
- Real-Time Metric-Semantic Mapping for Autonomous Navigation in Outdoor Environments [18.7565126823704]
We introduce an online metric-semantic mapping system that generates a global metric-semantic mesh map of large-scale outdoor environments.<n>Our mapping process achieves exceptional speed, with frame processing taking less than 7ms, regardless of scenario scale.<n>We integrate the resultant map into a real-world navigation system, enabling metric-semantic-based terrain assessment and autonomous point-to-point navigation within a campus environment.
arXiv Detail & Related papers (2024-11-30T00:05:10Z) - Deep Learning for Real Time Satellite Pose Estimation on Low Power Edge
TPU [58.720142291102135]
In this paper we propose a pose estimation software exploiting neural network architectures.
We show how low power machine learning accelerators could enable Artificial Intelligence exploitation in space.
arXiv Detail & Related papers (2022-04-07T08:53:18Z) - ViKiNG: Vision-Based Kilometer-Scale Navigation with Geographic Hints [94.60414567852536]
Long-range navigation requires both planning and reasoning about local traversability.
We propose a learning-based approach that integrates learning and planning.
ViKiNG can leverage its image-based learned controller and goal-directed to navigate to goals up to 3 kilometers away.
arXiv Detail & Related papers (2022-02-23T02:14:23Z) - Large-scale Autonomous Flight with Real-time Semantic SLAM under Dense
Forest Canopy [48.51396198176273]
We propose an integrated system that can perform large-scale autonomous flights and real-time semantic mapping in challenging under-canopy environments.
We detect and model tree trunks and ground planes from LiDAR data, which are associated across scans and used to constrain robot poses as well as tree trunk models.
A drift-compensation mechanism is designed to minimize the odometry drift using semantic SLAM outputs in real time, while maintaining planner optimality and controller stability.
arXiv Detail & Related papers (2021-09-14T07:24:53Z) - Real-Time Monocular Human Depth Estimation and Segmentation on Embedded
Systems [13.490605853268837]
Estimating a scene's depth to achieve collision avoidance against moving pedestrians is a crucial and fundamental problem in the robotic field.
This paper proposes a novel, low complexity network architecture for fast and accurate human depth estimation and segmentation in indoor environments.
arXiv Detail & Related papers (2021-08-24T03:26:08Z) - Rapid Exploration for Open-World Navigation with Latent Goal Models [78.45339342966196]
We describe a robotic learning system for autonomous exploration and navigation in diverse, open-world environments.
At the core of our method is a learned latent variable model of distances and actions, along with a non-parametric topological memory of images.
We use an information bottleneck to regularize the learned policy, giving us (i) a compact visual representation of goals, (ii) improved generalization capabilities, and (iii) a mechanism for sampling feasible goals for exploration.
arXiv Detail & Related papers (2021-04-12T23:14:41Z) - Sparse Auxiliary Networks for Unified Monocular Depth Prediction and
Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars.
In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors.
We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z) - On Deep Learning Techniques to Boost Monocular Depth Estimation for
Autonomous Navigation [1.9007546108571112]
Inferring the depth of images is a fundamental inverse problem within the field of Computer Vision.
We propose a new lightweight and fast supervised CNN architecture combined with novel feature extraction models.
We also introduce an efficient surface normals module, jointly with a simple geometric 2.5D loss function, to solve SIDE problems.
arXiv Detail & Related papers (2020-10-13T18:37:38Z) - Occupancy Anticipation for Efficient Exploration and Navigation [97.17517060585875]
We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions.
By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment.
Our approach is the winning entry in the 2020 Habitat PointNav Challenge.
arXiv Detail & Related papers (2020-08-21T03:16:51Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.