Related papers: Real-time single image depth perception in the wild with handheld devices

Real-time single image depth perception in the wild with handheld devices

URL: http://arxiv.org/abs/2006.05724v1
Date: Wed, 10 Jun 2020 08:30:20 GMT
Title: Real-time single image depth perception in the wild with handheld devices
Authors: Filippo Aleotti, Giulio Zaccaroni, Luca Bartolomei, Matteo Poggi, Fabio Tosi, Stefano Mattoccia
Abstract summary: Two main issues limit depth estimation from handheld devices in-the-wild. We show how they are both addressable adopting appropriate network design and training strategies. We report experimental results concerning real-time depth-aware augmented reality and image blurring with smartphones in-the-wild.
Score: 45.26484111468387
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Depth perception is paramount to tackle real-world problems, ranging from autonomous driving to consumer applications. For the latter, depth estimation from a single image represents the most versatile solution, since a standard camera is available on almost any handheld device. Nonetheless, two main issues limit its practical deployment: i) the low reliability when deployed in-the-wild and ii) the demanding resource requirements to achieve real-time performance, often not compatible with such devices. Therefore, in this paper, we deeply investigate these issues showing how they are both addressable adopting appropriate network design and training strategies -- also outlining how to map the resulting networks on handheld devices to achieve real-time performance. Our thorough evaluation highlights the ability of such fast networks to generalize well to new environments, a crucial feature required to tackle the extremely varied contexts faced in real applications. Indeed, to further support this evidence, we report experimental results concerning real-time depth-aware augmented reality and image blurring with smartphones in-the-wild.

Related papers

Agile gesture recognition for low-power applications: customisation for generalisation [41.728933551492275]
Automated hand gesture recognition has long been a focal point in the AI community. There is an increasing demand for gesture recognition technologies that operate on low-power sensor devices. In this study, we unveil a novel methodology for pattern recognition systems using adaptive and agile error correction.
arXiv Detail & Related papers (2024-03-12T19:34:18Z)
Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging Scenarios [103.72094710263656]
This paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework. We propose a novel confidence loss steering a confidence predictor network to yield a confidence map specifying latent potential depth areas. With the resulting confidence map, we propose a multi-modal fusion network that fuses the final depth in an end-to-end manner.
arXiv Detail & Related papers (2024-02-19T04:39:16Z)
A Comprehensive Study of Real-Time Object Detection Networks Across Multiple Domains: A Survey [9.861721674777877]
Deep neural network based object detectors are continuously evolving and are used in a multitude of applications. While safety-critical applications need high accuracy and reliability, low-latency tasks need resource and energy-efficient networks. A reference benchmark for existing networks does not exist, nor does a standard evaluation guideline for designing new networks.
arXiv Detail & Related papers (2022-08-23T12:01:16Z)
Efficient High-Resolution Deep Learning: A Survey [90.76576712433595]
Cameras in modern devices such as smartphones, satellites and medical equipment are capable of capturing very high resolution images and videos. Such high-resolution data often need to be processed by deep learning models for cancer detection, automated road navigation, weather prediction, surveillance, optimizing agricultural processes and many other applications. Using high-resolution images and videos as direct inputs for deep learning models creates many challenges due to their high number of parameters, computation cost, inference latency and GPU memory consumption. Several works in the literature propose better alternatives in order to deal with the challenges of high-resolution data and improve accuracy and speed while complying with hardware limitations
arXiv Detail & Related papers (2022-07-26T17:13:53Z)
Scalable Vehicle Re-Identification via Self-Supervision [66.2562538902156]
Vehicle Re-Identification is one of the key elements in city-scale vehicle analytics systems. Many state-of-the-art solutions for vehicle re-id mostly focus on improving the accuracy on existing re-id benchmarks and often ignore computational complexity. We propose a simple yet effective hybrid solution empowered by self-supervised training which only uses a single network during inference time.
arXiv Detail & Related papers (2022-05-16T12:14:42Z)
Monitoring social distancing with single image depth estimation [39.79652626235862]
Single image depth estimation can be a viable alternative to other depth perception techniques. Our framework can run reasonably fast and comparably to competitors, even on pure CPU systems.
arXiv Detail & Related papers (2022-04-04T17:58:02Z)
Calibrating Self-supervised Monocular Depth Estimation [77.77696851397539]
In the recent years, many methods demonstrated the ability of neural networks to learn depth and pose changes in a sequence of images, using only self-supervision as the training signal. We show that incorporating prior information about the camera configuration and the environment, we can remove the scale ambiguity and predict depth directly, still using the self-supervised formulation and not relying on any additional sensors.
arXiv Detail & Related papers (2020-09-16T14:35:45Z)
YOLOpeds: Efficient Real-Time Single-Shot Pedestrian Detection for Smart Camera Applications [2.588973722689844]
This work addresses the challenge of achieving a good trade-off between accuracy and speed for efficient deployment of deep-learning-based pedestrian detection in smart camera applications. A computationally efficient architecture is introduced based on separable convolutions and proposes integrating dense connections across layers and multi-scale feature fusion. Overall, YOLOpeds provides real-time sustained operation of over 30 frames per second with detection rates in the range of 86% outperforming existing deep learning models.
arXiv Detail & Related papers (2020-07-27T09:50:11Z)
Deploying Image Deblurring across Mobile Devices: A Perspective of Quality and Latency [11.572636762286775]
We conduct a search of portable network architectures for better quality-blur trade-off across mobile devices. This paper provides comprehensive experiments and comparisons to uncover the in-depth analysis for both latency and image quality. To the best of our knowledge, this is the first paper that addresses all the deployment issues of image deblurring task across mobile devices.
arXiv Detail & Related papers (2020-04-27T06:32:53Z)
DeFeat-Net: General Monocular Depth via Simultaneous Unsupervised Representation Learning [65.94499390875046]
DeFeat-Net is an approach to simultaneously learn a cross-domain dense feature representation. Our technique is able to outperform the current state-of-the-art with around 10% reduction in all error measures.
arXiv Detail & Related papers (2020-03-30T13:10:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.