3D Scene Inference from Transient Histograms
- URL: http://arxiv.org/abs/2211.05094v1
- Date: Wed, 9 Nov 2022 18:31:50 GMT
- Title: 3D Scene Inference from Transient Histograms
- Authors: Sacha Jungerman, Atul Ingle, Yin Li, and Mohit Gupta
- Abstract summary: Time-resolved image sensors that capture light at pico-to-nanosecond were once limited to niche applications.
We propose low-cost and low-power imaging modalities that capture scene information from minimal time-resolved image sensors.
- Score: 17.916392079019175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Time-resolved image sensors that capture light at pico-to-nanosecond
timescales were once limited to niche applications but are now rapidly becoming
mainstream in consumer devices. We propose low-cost and low-power imaging
modalities that capture scene information from minimal time-resolved image
sensors with as few as one pixel. The key idea is to flood illuminate large
scene patches (or the entire scene) with a pulsed light source and measure the
time-resolved reflected light by integrating over the entire illuminated area.
The one-dimensional measured temporal waveform, called \emph{transient},
encodes both distances and albedoes at all visible scene points and as such is
an aggregate proxy for the scene's 3D geometry. We explore the viability and
limitations of the transient waveforms by themselves for recovering scene
information, and also when combined with traditional RGB cameras. We show that
plane estimation can be performed from a single transient and that using only a
few more it is possible to recover a depth map of the whole scene. We also show
two proof-of-concept hardware prototypes that demonstrate the feasibility of
our approach for compact, mobile, and budget-limited applications.
Related papers
- SpikeNVS: Enhancing Novel View Synthesis from Blurry Images via Spike Camera [78.20482568602993]
Conventional RGB cameras are susceptible to motion blur.
Neuromorphic cameras like event and spike cameras inherently capture more comprehensive temporal information.
Our design can enhance novel view synthesis across NeRF and 3DGS.
arXiv Detail & Related papers (2024-04-10T03:31:32Z) - Flying with Photons: Rendering Novel Views of Propagating Light [37.06220870989172]
We present an imaging and neural rendering technique that seeks to synthesize videos of light propagating through a scene from novel, moving camera viewpoints.
Our approach relies on a new ultrafast imaging setup to capture a first-of-its kind, multi-viewpoint video dataset with pico-second-level temporal resolution.
arXiv Detail & Related papers (2024-04-09T17:48:52Z) - PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar [25.332440946211236]
3D reconstruction from a single-view is challenging because of the ambiguity from monocular cues and lack of information about occluded regions.
We propose using time-of-flight data captured by a single-photon avalanche diode to overcome these limitations.
We demonstrate that we can reconstruct visible and occluded geometry without data priors or reliance on controlled ambient lighting or scene albedo.
arXiv Detail & Related papers (2023-12-21T18:59:53Z) - Event-based Motion-Robust Accurate Shape Estimation for Mixed
Reflectance Scenes [17.446182782836747]
We present a novel event-based structured light system that enables fast 3D imaging of mixed reflectance scenes with high accuracy.
We use epipolar constraints that intrinsically enable the measured reflections into decomposing diffuse, two-bounce specular, and other multi-bounce reflections.
The resulting system achieves fast and motion-robust reconstructions of mixed reflectance scenes with 500 $mu$m accuracy.
arXiv Detail & Related papers (2023-11-16T08:12:10Z) - Shakes on a Plane: Unsupervised Depth Estimation from Unstabilized
Photography [54.36608424943729]
We show that in a ''long-burst'', forty-two 12-megapixel RAW frames captured in a two-second sequence, there is enough parallax information from natural hand tremor alone to recover high-quality scene depth.
We devise a test-time optimization approach that fits a neural RGB-D representation to long-burst data and simultaneously estimates scene depth and camera motion.
arXiv Detail & Related papers (2022-12-22T18:54:34Z) - One-Shot Neural Fields for 3D Object Understanding [112.32255680399399]
We present a unified and compact scene representation for robotics.
Each object in the scene is depicted by a latent code capturing geometry and appearance.
This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction, and stable grasp prediction.
arXiv Detail & Related papers (2022-10-21T17:33:14Z) - Event Guided Depth Sensing [50.997474285910734]
We present an efficient bio-inspired event-camera-driven depth estimation algorithm.
In our approach, we illuminate areas of interest densely, depending on the scene activity detected by the event camera.
We show the feasibility of our approach in a simulated autonomous driving sequences and real indoor environments.
arXiv Detail & Related papers (2021-10-20T11:41:11Z) - Learning to Recover 3D Scene Shape from a Single Image [98.20106822614392]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then use 3D point cloud encoders to predict the missing depth shift and focal length that allow us to recover a realistic 3D scene shape.
arXiv Detail & Related papers (2020-12-17T02:35:13Z) - Event-based Stereo Visual Odometry [42.77238738150496]
We present a solution to the problem of visual odometry from the data acquired by a stereo event-based camera rig.
We seek to maximize thetemporal consistency of stereo event-based data while using a simple and efficient representation.
arXiv Detail & Related papers (2020-07-30T15:53:28Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.