Real-Time Monocular Human Depth Estimation and Segmentation on Embedded
Systems
- URL: http://arxiv.org/abs/2108.10506v1
- Date: Tue, 24 Aug 2021 03:26:08 GMT
- Title: Real-Time Monocular Human Depth Estimation and Segmentation on Embedded
Systems
- Authors: Shan An, Fangru Zhou, Mei Yang, Haogang Zhu, Changhong Fu, and
Konstantinos A. Tsintotas
- Abstract summary: Estimating a scene's depth to achieve collision avoidance against moving pedestrians is a crucial and fundamental problem in the robotic field.
This paper proposes a novel, low complexity network architecture for fast and accurate human depth estimation and segmentation in indoor environments.
- Score: 13.490605853268837
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimating a scene's depth to achieve collision avoidance against moving
pedestrians is a crucial and fundamental problem in the robotic field. This
paper proposes a novel, low complexity network architecture for fast and
accurate human depth estimation and segmentation in indoor environments, aiming
to applications for resource-constrained platforms (including battery-powered
aerial, micro-aerial, and ground vehicles) with a monocular camera being the
primary perception module. Following the encoder-decoder structure, the
proposed framework consists of two branches, one for depth prediction and
another for semantic segmentation. Moreover, network structure optimization is
employed to improve its forward inference speed. Exhaustive experiments on
three self-generated datasets prove our pipeline's capability to execute in
real-time, achieving higher frame rates than contemporary state-of-the-art
frameworks (114.6 frames per second on an NVIDIA Jetson Nano GPU with TensorRT)
while maintaining comparable accuracy.
Related papers
- ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks.
We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation.
Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - METER: a mobile vision transformer architecture for monocular depth
estimation [0.0]
We propose METER, a novel lightweight vision transformer architecture capable of achieving state of the art estimations.
We provide a solution consisting of three alternative configurations of METER, a novel loss function to balance pixel estimation and reconstruction of image details, and a new data augmentation strategy to improve the overall final predictions.
arXiv Detail & Related papers (2024-03-13T09:30:08Z) - Real-time Monocular Depth Estimation on Embedded Systems [32.40848141360501]
Two efficient RT-MonoDepth and RT-MonoDepth-S architectures are proposed.
RT-MonoDepth and RT-MonoDepth-S achieve frame rates of 18.4&30.5 FPS on NVIDIA Jetson Nano and 253.0&364.1 FPS on Jetson AGX Orin.
arXiv Detail & Related papers (2023-08-21T08:59:59Z) - Rethinking Lightweight Salient Object Detection via Network Depth-Width
Tradeoff [26.566339984225756]
Existing salient object detection methods often adopt deeper and wider networks for better performance.
We propose a novel trilateral decoder framework by decoupling the U-shape structure into three complementary branches.
We show that our method achieves better efficiency-accuracy balance across five benchmarks.
arXiv Detail & Related papers (2023-01-17T03:43:25Z) - Distortion-Aware Network Pruning and Feature Reuse for Real-time Video
Segmentation [49.17930380106643]
We propose a novel framework to speed up any architecture with skip-connections for real-time vision tasks.
Specifically, at the arrival of each frame, we transform the features from the previous frame to reuse them at specific spatial bins.
We then perform partial computation of the backbone network on the regions of the current frame that captures temporal differences between the current and previous frame.
arXiv Detail & Related papers (2022-06-20T07:20:02Z) - Deep Learning for Real Time Satellite Pose Estimation on Low Power Edge
TPU [58.720142291102135]
In this paper we propose a pose estimation software exploiting neural network architectures.
We show how low power machine learning accelerators could enable Artificial Intelligence exploitation in space.
arXiv Detail & Related papers (2022-04-07T08:53:18Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - Unsupervised Monocular Depth Learning with Integrated Intrinsics and
Spatio-Temporal Constraints [61.46323213702369]
This work presents an unsupervised learning framework that is able to predict at-scale depth maps and egomotion.
Our results demonstrate strong performance when compared to the current state-of-the-art on multiple sequences of the KITTI driving dataset.
arXiv Detail & Related papers (2020-11-02T22:26:58Z) - Real-time Semantic Segmentation with Fast Attention [94.88466483540692]
We propose a novel architecture for semantic segmentation of high-resolution images and videos in real-time.
The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism.
We show that results on multiple datasets demonstrate superior performance with better accuracy and speed compared to existing approaches.
arXiv Detail & Related papers (2020-07-07T22:37:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.