Related papers: Real-Time Monocular Human Depth Estimation and Segmentation on Embedded Systems

Real-Time Monocular Human Depth Estimation and Segmentation on Embedded Systems

URL: http://arxiv.org/abs/2108.10506v1
Date: Tue, 24 Aug 2021 03:26:08 GMT
Title: Real-Time Monocular Human Depth Estimation and Segmentation on Embedded Systems
Authors: Shan An, Fangru Zhou, Mei Yang, Haogang Zhu, Changhong Fu, and Konstantinos A. Tsintotas
Abstract summary: Estimating a scene's depth to achieve collision avoidance against moving pedestrians is a crucial and fundamental problem in the robotic field. This paper proposes a novel, low complexity network architecture for fast and accurate human depth estimation and segmentation in indoor environments.
Score: 13.490605853268837
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Estimating a scene's depth to achieve collision avoidance against moving pedestrians is a crucial and fundamental problem in the robotic field. This paper proposes a novel, low complexity network architecture for fast and accurate human depth estimation and segmentation in indoor environments, aiming to applications for resource-constrained platforms (including battery-powered aerial, micro-aerial, and ground vehicles) with a monocular camera being the primary perception module. Following the encoder-decoder structure, the proposed framework consists of two branches, one for depth prediction and another for semantic segmentation. Moreover, network structure optimization is employed to improve its forward inference speed. Exhaustive experiments on three self-generated datasets prove our pipeline's capability to execute in real-time, achieving higher frame rates than contemporary state-of-the-art frameworks (114.6 frames per second on an NVIDIA Jetson Nano GPU with TensorRT) while maintaining comparable accuracy.

Related papers

Co-SemDepth: Fast Joint Semantic Segmentation and Depth Estimation on Aerial Images [0.9883261192383611]
In this paper, we leverage monocular cameras on aerial robots to predict depth and semantic maps in unstructured environments. We propose a joint deep-learning architecture that can perform the two tasks accurately and rapidly.
arXiv Detail & Related papers (2025-03-23T08:25:07Z)
ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks. We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation. Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z)
METER: a mobile vision transformer architecture for monocular depth estimation [0.0]
We propose METER, a novel lightweight vision transformer architecture capable of achieving state of the art estimations. We provide a solution consisting of three alternative configurations of METER, a novel loss function to balance pixel estimation and reconstruction of image details, and a new data augmentation strategy to improve the overall final predictions.
arXiv Detail & Related papers (2024-03-13T09:30:08Z)
Real-time Monocular Depth Estimation on Embedded Systems [32.40848141360501]
Two efficient RT-MonoDepth and RT-MonoDepth-S architectures are proposed. RT-MonoDepth and RT-MonoDepth-S achieve frame rates of 18.4&30.5 FPS on NVIDIA Jetson Nano and 253.0&364.1 FPS on Jetson AGX Orin.
arXiv Detail & Related papers (2023-08-21T08:59:59Z)
Rethinking Lightweight Salient Object Detection via Network Depth-Width Tradeoff [26.566339984225756]
Existing salient object detection methods often adopt deeper and wider networks for better performance. We propose a novel trilateral decoder framework by decoupling the U-shape structure into three complementary branches. We show that our method achieves better efficiency-accuracy balance across five benchmarks.
arXiv Detail & Related papers (2023-01-17T03:43:25Z)
Distortion-Aware Network Pruning and Feature Reuse for Real-time Video Segmentation [49.17930380106643]
We propose a novel framework to speed up any architecture with skip-connections for real-time vision tasks. Specifically, at the arrival of each frame, we transform the features from the previous frame to reuse them at specific spatial bins. We then perform partial computation of the backbone network on the regions of the current frame that captures temporal differences between the current and previous frame.
arXiv Detail & Related papers (2022-06-20T07:20:02Z)
Deep Learning for Real Time Satellite Pose Estimation on Low Power Edge TPU [58.720142291102135]
In this paper we propose a pose estimation software exploiting neural network architectures. We show how low power machine learning accelerators could enable Artificial Intelligence exploitation in space.
arXiv Detail & Related papers (2022-04-07T08:53:18Z)
Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks. specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples. We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z)
Unsupervised Monocular Depth Learning with Integrated Intrinsics and Spatio-Temporal Constraints [61.46323213702369]
This work presents an unsupervised learning framework that is able to predict at-scale depth maps and egomotion. Our results demonstrate strong performance when compared to the current state-of-the-art on multiple sequences of the KITTI driving dataset.
arXiv Detail & Related papers (2020-11-02T22:26:58Z)
Real-time Semantic Segmentation with Fast Attention [94.88466483540692]
We propose a novel architecture for semantic segmentation of high-resolution images and videos in real-time. The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism. We show that results on multiple datasets demonstrate superior performance with better accuracy and speed compared to existing approaches.
arXiv Detail & Related papers (2020-07-07T22:37:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.