Realtime 3D Object Detection for Headsets
- URL: http://arxiv.org/abs/2201.08812v1
- Date: Sat, 15 Jan 2022 05:50:18 GMT
- Title: Realtime 3D Object Detection for Headsets
- Authors: Yongjie Guan and Xueyu Hou and Nan Wu and Bo Han and Tao Han
- Abstract summary: We propose DeepMix, a mobility-aware, lightweight, and hybrid3D object detection framework.
DeepMix intelligently combines edge-assisted 2D object detection and novel, on-device 3D bounding box estimations.
This leads to low end-to-end latency and significantly boosts detection accuracy in mobile scenarios.
- Score: 19.096803385184174
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mobile headsets should be capable of understanding 3D physical environments
to offer a truly immersive experience for augmented/mixed reality (AR/MR).
However, their small form-factor and limited computation resources make it
extremely challenging to execute in real-time 3D vision algorithms, which are
known to be more compute-intensive than their 2D counterparts. In this paper,
we propose DeepMix, a mobility-aware, lightweight, and hybrid3D object
detection framework for improving the user experience of AR/MR on mobile
headsets. Motivated by our analysis and evaluation of state-of-the-art 3D
object detection models, DeepMix intelligently combines edge-assisted 2D object
detection and novel, on-device 3D bounding box estimations that leverage depth
data captured by headsets. This leads to low end-to-end latency and
significantly boosts detection accuracy in mobile scenarios.
Related papers
- OccupancyDETR: Using DETR for Mixed Dense-sparse 3D Occupancy Prediction [10.87136340580404]
Visual-based 3D semantic occupancy perception is a key technology for robotics, including autonomous vehicles.
We propose a novel 3D semantic occupancy perception method, OccupancyDETR, which utilizes a DETR-like object detection, a mixed dense-sparse 3D occupancy decoder.
Our approach strikes a balance between efficiency and accuracy, achieving faster inference times, lower resource consumption, and improved performance for small object detection.
arXiv Detail & Related papers (2023-09-15T16:06:23Z) - R2Det: Redemption from Range-view for Accurate 3D Object Detection [16.855672228478074]
Redemption from Range-view Module (R2M) is a plug-and-play approach for 3D surface texture enhancement from the 2D range view to the 3D point view.
R2M can be seamlessly integrated into state-of-the-art LiDAR-based 3D object detectors as preprocessing.
arXiv Detail & Related papers (2023-07-21T10:36:05Z) - BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection
with Dynamic Temporal Stereo [15.479670314689418]
We introduce an effective temporal stereo method to dynamically select the scale of matching candidates.
We design an iterative algorithm to update more valuable candidates, making it adaptive to moving candidates.
BEVStereo achieves the new state-of-the-art performance on the camera-only track of nuScenes dataset.
arXiv Detail & Related papers (2022-09-21T10:21:25Z) - CMR3D: Contextualized Multi-Stage Refinement for 3D Object Detection [57.44434974289945]
We propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework.
Our framework takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene.
In addition to 3D object detection, we investigate the effectiveness of our framework for the problem of 3D object counting.
arXiv Detail & Related papers (2022-09-13T05:26:09Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - YOLOStereo3D: A Step Back to 2D for Efficient Stereo 3D Detection [6.5702792909006735]
YOLOStereo3D is trained on one single GPU and runs at more than ten fps.
It demonstrates performance comparable to state-of-the-art stereo 3D detection frameworks without usage of LiDAR data.
arXiv Detail & Related papers (2021-03-17T03:43:54Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - Kinematic 3D Object Detection in Monocular Video [123.7119180923524]
We propose a novel method for monocular video-based 3D object detection which carefully leverages kinematic motion to improve precision of 3D localization.
We achieve state-of-the-art performance on monocular 3D object detection and the Bird's Eye View tasks within the KITTI self-driving dataset.
arXiv Detail & Related papers (2020-07-19T01:15:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.