Real-time single image depth perception in the wild with handheld
devices
- URL: http://arxiv.org/abs/2006.05724v1
- Date: Wed, 10 Jun 2020 08:30:20 GMT
- Title: Real-time single image depth perception in the wild with handheld
devices
- Authors: Filippo Aleotti, Giulio Zaccaroni, Luca Bartolomei, Matteo Poggi,
Fabio Tosi, Stefano Mattoccia
- Abstract summary: Two main issues limit depth estimation from handheld devices in-the-wild.
We show how they are both addressable adopting appropriate network design and training strategies.
We report experimental results concerning real-time depth-aware augmented reality and image blurring with smartphones in-the-wild.
- Score: 45.26484111468387
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth perception is paramount to tackle real-world problems, ranging from
autonomous driving to consumer applications. For the latter, depth estimation
from a single image represents the most versatile solution, since a standard
camera is available on almost any handheld device. Nonetheless, two main issues
limit its practical deployment: i) the low reliability when deployed
in-the-wild and ii) the demanding resource requirements to achieve real-time
performance, often not compatible with such devices. Therefore, in this paper,
we deeply investigate these issues showing how they are both addressable
adopting appropriate network design and training strategies -- also outlining
how to map the resulting networks on handheld devices to achieve real-time
performance. Our thorough evaluation highlights the ability of such fast
networks to generalize well to new environments, a crucial feature required to
tackle the extremely varied contexts faced in real applications. Indeed, to
further support this evidence, we report experimental results concerning
real-time depth-aware augmented reality and image blurring with smartphones
in-the-wild.
Related papers
- Agile gesture recognition for low-power applications: customisation for generalisation [41.728933551492275]
Automated hand gesture recognition has long been a focal point in the AI community.
There is an increasing demand for gesture recognition technologies that operate on low-power sensor devices.
In this study, we unveil a novel methodology for pattern recognition systems using adaptive and agile error correction.
arXiv Detail & Related papers (2024-03-12T19:34:18Z) - Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging
Scenarios [103.72094710263656]
This paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework.
We propose a novel confidence loss steering a confidence predictor network to yield a confidence map specifying latent potential depth areas.
With the resulting confidence map, we propose a multi-modal fusion network that fuses the final depth in an end-to-end manner.
arXiv Detail & Related papers (2024-02-19T04:39:16Z) - A Comprehensive Study of Real-Time Object Detection Networks Across
Multiple Domains: A Survey [9.861721674777877]
Deep neural network based object detectors are continuously evolving and are used in a multitude of applications.
While safety-critical applications need high accuracy and reliability, low-latency tasks need resource and energy-efficient networks.
A reference benchmark for existing networks does not exist, nor does a standard evaluation guideline for designing new networks.
arXiv Detail & Related papers (2022-08-23T12:01:16Z) - Efficient High-Resolution Deep Learning: A Survey [90.76576712433595]
Cameras in modern devices such as smartphones, satellites and medical equipment are capable of capturing very high resolution images and videos.
Such high-resolution data often need to be processed by deep learning models for cancer detection, automated road navigation, weather prediction, surveillance, optimizing agricultural processes and many other applications.
Using high-resolution images and videos as direct inputs for deep learning models creates many challenges due to their high number of parameters, computation cost, inference latency and GPU memory consumption.
Several works in the literature propose better alternatives in order to deal with the challenges of high-resolution data and improve accuracy and speed while complying with hardware limitations
arXiv Detail & Related papers (2022-07-26T17:13:53Z) - Scalable Vehicle Re-Identification via Self-Supervision [66.2562538902156]
Vehicle Re-Identification is one of the key elements in city-scale vehicle analytics systems.
Many state-of-the-art solutions for vehicle re-id mostly focus on improving the accuracy on existing re-id benchmarks and often ignore computational complexity.
We propose a simple yet effective hybrid solution empowered by self-supervised training which only uses a single network during inference time.
arXiv Detail & Related papers (2022-05-16T12:14:42Z) - Monitoring social distancing with single image depth estimation [39.79652626235862]
Single image depth estimation can be a viable alternative to other depth perception techniques.
Our framework can run reasonably fast and comparably to competitors, even on pure CPU systems.
arXiv Detail & Related papers (2022-04-04T17:58:02Z) - Calibrating Self-supervised Monocular Depth Estimation [77.77696851397539]
In the recent years, many methods demonstrated the ability of neural networks to learn depth and pose changes in a sequence of images, using only self-supervision as the training signal.
We show that incorporating prior information about the camera configuration and the environment, we can remove the scale ambiguity and predict depth directly, still using the self-supervised formulation and not relying on any additional sensors.
arXiv Detail & Related papers (2020-09-16T14:35:45Z) - YOLOpeds: Efficient Real-Time Single-Shot Pedestrian Detection for Smart
Camera Applications [2.588973722689844]
This work addresses the challenge of achieving a good trade-off between accuracy and speed for efficient deployment of deep-learning-based pedestrian detection in smart camera applications.
A computationally efficient architecture is introduced based on separable convolutions and proposes integrating dense connections across layers and multi-scale feature fusion.
Overall, YOLOpeds provides real-time sustained operation of over 30 frames per second with detection rates in the range of 86% outperforming existing deep learning models.
arXiv Detail & Related papers (2020-07-27T09:50:11Z) - Deploying Image Deblurring across Mobile Devices: A Perspective of
Quality and Latency [11.572636762286775]
We conduct a search of portable network architectures for better quality-blur trade-off across mobile devices.
This paper provides comprehensive experiments and comparisons to uncover the in-depth analysis for both latency and image quality.
To the best of our knowledge, this is the first paper that addresses all the deployment issues of image deblurring task across mobile devices.
arXiv Detail & Related papers (2020-04-27T06:32:53Z) - DeFeat-Net: General Monocular Depth via Simultaneous Unsupervised
Representation Learning [65.94499390875046]
DeFeat-Net is an approach to simultaneously learn a cross-domain dense feature representation.
Our technique is able to outperform the current state-of-the-art with around 10% reduction in all error measures.
arXiv Detail & Related papers (2020-03-30T13:10:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.