Related papers: Extended monocular 3D imaging

Extended monocular 3D imaging

URL: http://arxiv.org/abs/2502.07403v1
Date: Tue, 11 Feb 2025 09:32:31 GMT
Title: Extended monocular 3D imaging
Authors: Zicheng Shen, Feng Zhao, Yibo Ni, Yuanmu Yang,
Abstract summary: We introduce an extended monocular 3D imaging (EM3D) framework that fully exploits the vectorial wave nature of light.<n>We experimentally demonstrate the snapshot acquisition of a million-pixel and accurate 3D point cloud for extended scenes.
Score: 3.964710267993159
License: http://creativecommons.org/licenses/by/4.0/
Abstract: 3D vision is of paramount importance for numerous applications ranging from machine intelligence to precision metrology. Despite much recent progress, the majority of 3D imaging hardware remains bulky and complicated and provides much lower image resolution compared to their 2D counterparts. Moreover, there are many well-known scenarios that existing 3D imaging solutions frequently fail. Here, we introduce an extended monocular 3D imaging (EM3D) framework that fully exploits the vectorial wave nature of light. Via the multi-stage fusion of diffraction- and polarization-based depth cues, using a compact monocular camera equipped with a diffractive-refractive hybrid lens, we experimentally demonstrate the snapshot acquisition of a million-pixel and accurate 3D point cloud for extended scenes that are traditionally challenging, including those with low texture, being highly reflective, or nearly transparent, without a data prior. Furthermore, we discover that the combination of depth and polarization information can unlock unique new opportunities in material identification, which may further expand machine intelligence for applications like target recognition and face anti-spoofing. The straightforward yet powerful architecture thus opens up a new path for a higher-dimensional machine vision in a minimal form factor, facilitating the deployment of monocular cameras for applications in much more diverse scenarios.

Related papers

UniK3D: Universal Camera Monocular 3D Estimation [62.06785782635153]
We present UniK3D, the first generalizable method for monocular 3D estimation able to model any camera. Our method introduces a spherical 3D representation which allows for better disentanglement of camera and scene geometry. A comprehensive zero-shot evaluation on 13 diverse datasets demonstrates the state-of-the-art performance of UniK3D across 3D, depth, and camera metrics.
arXiv Detail & Related papers (2025-03-20T17:49:23Z)
VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection [80.62052650370416]
monocular 3D object detection holds significant importance across various applications, including autonomous driving and robotics. In this paper, we present VFMM3D, an innovative framework that leverages the capabilities of Vision Foundation Models (VFMs) to accurately transform single-view images into LiDAR point cloud representations.
arXiv Detail & Related papers (2024-04-15T03:12:12Z)
Towards 3D Vision with Low-Cost Single-Photon Cameras [24.711165102559438]
We present a method for reconstructing 3D shape of arbitrary Lambertian objects based on measurements by miniature, energy-efficient, low-cost single-photon cameras. Our work draws a connection between image-based modeling and active range scanning and is a step towards 3D vision with single-photon cameras.
arXiv Detail & Related papers (2024-03-26T15:40:05Z)
Weakly Supervised Monocular 3D Detection with a Single-View Image [58.57978772009438]
Monocular 3D detection aims for precise 3D object localization from a single-view image. We propose SKD-WM3D, a weakly supervised monocular 3D detection framework. We show that SKD-WM3D surpasses the state-of-the-art clearly and is even on par with many fully supervised methods.
arXiv Detail & Related papers (2024-02-29T13:26:47Z)
Multi-Modal Dataset Acquisition for Photometrically Challenging Object [56.30027922063559]
This paper addresses the limitations of current datasets for 3D vision tasks in terms of accuracy, size, realism, and suitable imaging modalities for photometrically challenging objects. We propose a novel annotation and acquisition pipeline that enhances existing 3D perception and 6D object pose datasets.
arXiv Detail & Related papers (2023-08-21T10:38:32Z)
A Flexible Multi-view Multi-modal Imaging System for Outdoor Scenes [37.5716419191546]
We propose a wireless multi-view multi-modal 3D imaging system generally applicable to large outdoor scenes. Multiple spatially distributed slave nodes equipped with cameras and LiDARs are connected to form a wireless sensor network. This system is the first imaging system that integrates mutli-view RGB cameras and LiDARs in large outdoor scenes.
arXiv Detail & Related papers (2023-02-21T06:14:05Z)
Weakly Supervised 3D Multi-person Pose Estimation for Large-scale Scenes based on Monocular Camera and Single LiDAR [41.39277657279448]
We propose a monocular camera and single LiDAR-based method for 3D multi-person pose estimation in large-scale scenes. Specifically, we design an effective fusion strategy to take advantage of multi-modal input data, including images and point cloud. Our method exploits the inherent geometry constraints of point cloud for self-supervision and utilizes 2D keypoints on images for weak supervision.
arXiv Detail & Related papers (2022-11-30T12:50:40Z)
SGM3D: Stereo Guided Monocular 3D Object Detection [62.11858392862551]
We propose a stereo-guided monocular 3D object detection network, termed SGM3D. We exploit robust 3D features extracted from stereo images to enhance the features learned from the monocular image. Our method can be integrated into many other monocular approaches to boost performance without introducing any extra computational cost.
arXiv Detail & Related papers (2021-12-03T13:57:14Z)
MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation. Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z)
Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras. We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points. Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.