Mobile AR Depth Estimation: Challenges & Prospects -- Extended Version
- URL: http://arxiv.org/abs/2310.14437v1
- Date: Sun, 22 Oct 2023 22:47:51 GMT
- Title: Mobile AR Depth Estimation: Challenges & Prospects -- Extended Version
- Authors: Ashkan Ganj, Yiqin Zhao, Hang Su, Tian Guo
- Abstract summary: We investigate the challenges and opportunities of achieving accurate metric depth estimation in mobile AR.
We tested four different state-of-the-art monocular depth estimation models on a newly introduced dataset (ARKitScenes)
Our research provides promising future directions to explore and solve those challenges.
- Score: 12.887748044339913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Metric depth estimation plays an important role in mobile augmented reality
(AR). With accurate metric depth, we can achieve more realistic user
interactions such as object placement and occlusion detection. While
specialized hardware like LiDAR demonstrates its promise, its restricted
availability, i.e., only on selected high-end mobile devices, and performance
limitations such as range and sensitivity to the environment, make it less
ideal. Monocular depth estimation, on the other hand, relies solely on mobile
cameras, which are ubiquitous, making it a promising alternative for mobile AR.
In this paper, we investigate the challenges and opportunities of achieving
accurate metric depth estimation in mobile AR. We tested four different
state-of-the-art monocular depth estimation models on a newly introduced
dataset (ARKitScenes) and identified three types of challenges: hard-ware,
data, and model related challenges. Furthermore, our research provides
promising future directions to explore and solve those challenges. These
directions include (i) using more hardware-related information from the mobile
device's camera and other available sensors, (ii) capturing high-quality data
to reflect real-world AR scenarios, and (iii) designing a model architecture to
utilize the new information.
Related papers
- Multi-Modal Dataset Acquisition for Photometrically Challenging Object [56.30027922063559]
This paper addresses the limitations of current datasets for 3D vision tasks in terms of accuracy, size, realism, and suitable imaging modalities for photometrically challenging objects.
We propose a novel annotation and acquisition pipeline that enhances existing 3D perception and 6D object pose datasets.
arXiv Detail & Related papers (2023-08-21T10:38:32Z) - Efficient Single-Image Depth Estimation on Mobile Devices, Mobile AI &
AIM 2022 Challenge: Report [108.88637766066759]
Deep learning-based single image depth estimation solutions can show a real-time performance on IoT platforms and smartphones.
Models developed in the challenge are also compatible with any Android or Linux-based mobile devices.
arXiv Detail & Related papers (2022-11-07T22:20:07Z) - LaMAR: Benchmarking Localization and Mapping for Augmented Reality [80.23361950062302]
We introduce LaMAR, a new benchmark with a comprehensive capture and GT pipeline that co-registers realistic trajectories and sensor streams captured by heterogeneous AR devices.
We publish a benchmark dataset of diverse and large-scale scenes recorded with head-mounted and hand-held AR devices.
arXiv Detail & Related papers (2022-10-19T17:58:17Z) - Towards Multimodal Multitask Scene Understanding Models for Indoor
Mobile Agents [49.904531485843464]
In this paper, we discuss the main challenge: insufficient, or even no, labeled data for real-world indoor environments.
We describe MMISM (Multi-modality input Multi-task output Indoor Scene understanding Model) to tackle the above challenges.
MMISM considers RGB images as well as sparse Lidar points as inputs and 3D object detection, depth completion, human pose estimation, and semantic segmentation as output tasks.
We show that MMISM performs on par or even better than single-task models.
arXiv Detail & Related papers (2022-09-27T04:49:19Z) - Depth Estimation Matters Most: Improving Per-Object Depth Estimation for
Monocular 3D Detection and Tracking [47.59619420444781]
Approaches to monocular 3D perception including detection and tracking often yield inferior performance when compared to LiDAR-based techniques.
We propose a multi-level fusion method that combines different representations (RGB and pseudo-LiDAR) and temporal information across multiple frames for objects (tracklets) to enhance per-object depth estimation.
arXiv Detail & Related papers (2022-06-08T03:37:59Z) - Realtime 3D Object Detection for Headsets [19.096803385184174]
We propose DeepMix, a mobility-aware, lightweight, and hybrid3D object detection framework.
DeepMix intelligently combines edge-assisted 2D object detection and novel, on-device 3D bounding box estimations.
This leads to low end-to-end latency and significantly boosts detection accuracy in mobile scenarios.
arXiv Detail & Related papers (2022-01-15T05:50:18Z) - Object Detection in the Context of Mobile Augmented Reality [16.49070406578342]
We propose a novel approach that combines the geometric information from VIO with semantic information from object detectors to improve the performance of object detection on mobile devices.
Our approach includes three components: (1) an image orientation correction method, (2) a scale-based filtering approach, and (3) an online semantic map.
The results show that our approach can improve on the accuracy of generic object detectors by 12% on our dataset.
arXiv Detail & Related papers (2020-08-15T05:15:00Z) - siaNMS: Non-Maximum Suppression with Siamese Networks for Multi-Camera
3D Object Detection [65.03384167873564]
A siamese network is integrated into the pipeline of a well-known 3D object detector approach.
associations are exploited to enhance the 3D box regression of the object.
The experimental evaluation on the nuScenes dataset shows that the proposed method outperforms traditional NMS approaches.
arXiv Detail & Related papers (2020-02-19T15:32:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.