Photoreal Scene Reconstruction from an Egocentric Device
- URL: http://arxiv.org/abs/2506.04444v1
- Date: Wed, 04 Jun 2025 20:53:43 GMT
- Title: Photoreal Scene Reconstruction from an Egocentric Device
- Authors: Zhaoyang Lv, Maurizio Monge, Ka Chen, Yufeng Zhu, Michael Goesele, Jakob Engel, Zhao Dong, Richard Newcombe,
- Abstract summary: Existing methodologies assume using frame-rate 6DoF pose estimated from the device's visual-inertial odometry system.<n>We employ visual-inertial bundle adjustment (VIBA) to calibrate the precise timestamps and movement of the rolling shutter RGB sensing camera.<n>We incorporate a physical image formation model based into Gaussian Splatting, which effectively addresses the sensor characteristics.
- Score: 5.581317382137083
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we investigate the challenges associated with using egocentric devices to photorealistic reconstruct the scene in high dynamic range. Existing methodologies typically assume using frame-rate 6DoF pose estimated from the device's visual-inertial odometry system, which may neglect crucial details necessary for pixel-accurate reconstruction. This study presents two significant findings. Firstly, in contrast to mainstream work treating RGB camera as global shutter frame-rate camera, we emphasize the importance of employing visual-inertial bundle adjustment (VIBA) to calibrate the precise timestamps and movement of the rolling shutter RGB sensing camera in a high frequency trajectory format, which ensures an accurate calibration of the physical properties of the rolling-shutter camera. Secondly, we incorporate a physical image formation model based into Gaussian Splatting, which effectively addresses the sensor characteristics, including the rolling-shutter effect of RGB cameras and the dynamic ranges measured by sensors. Our proposed formulation is applicable to the widely-used variants of Gaussian Splats representation. We conduct a comprehensive evaluation of our pipeline using the open-source Project Aria device under diverse indoor and outdoor lighting conditions, and further validate it on a Meta Quest3 device. Across all experiments, we observe a consistent visual enhancement of +1 dB in PSNR by incorporating VIBA, with an additional +1 dB achieved through our proposed image formation model. Our complete implementation, evaluation datasets, and recording profile are available at http://www.projectaria.com/photoreal-reconstruction/
Related papers
- RA-NeRF: Robust Neural Radiance Field Reconstruction with Accurate Camera Pose Estimation under Complex Trajectories [21.97835451388508]
RA-NeRF is capable of predicting highly accurate camera poses even with complex camera trajectories.<n> RA-NeRF achieves state-of-the-art results in both camera pose estimation and visual quality.
arXiv Detail & Related papers (2025-06-18T08:21:19Z) - Targetless LiDAR-Camera Calibration with Anchored 3D Gaussians [21.057702337896995]
We present a targetless LiDAR-camera calibration method that jointly optimize sensor poses and scene geometry from arbitrary scenes.<n>We validate our method through extensive experiments on two real-world autonomous driving datasets.
arXiv Detail & Related papers (2025-04-06T20:00:01Z) - Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion [25.54868552979793]
We present a method that adapts to camera motion and allows high-quality scene reconstruction with handheld video data.
Our results with both synthetic and real data demonstrate superior performance in mitigating camera motion over existing methods.
arXiv Detail & Related papers (2024-03-20T06:19:41Z) - Complementing Event Streams and RGB Frames for Hand Mesh Reconstruction [51.87279764576998]
We propose EvRGBHand -- the first approach for 3D hand mesh reconstruction with an event camera and an RGB camera compensating for each other.
EvRGBHand can tackle overexposure and motion blur issues in RGB-based HMR and foreground scarcity and background overflow issues in event-based HMR.
arXiv Detail & Related papers (2024-03-12T06:04:50Z) - RGB-based Category-level Object Pose Estimation via Decoupled Metric
Scale Recovery [72.13154206106259]
We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations.
Specifically, we leverage a pre-trained monocular estimator to extract local geometric information.
A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
arXiv Detail & Related papers (2023-09-19T02:20:26Z) - DynaMoN: Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields [71.94156412354054]
We propose Dynamic Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields (DynaMoN)<n>DynaMoN handles dynamic content for initial camera pose estimation and statics-focused ray sampling for fast and accurate novel-view synthesis.<n>We extensively evaluate our approach on two real-world dynamic datasets, the TUM RGB-D dataset and the BONN RGB-D Dynamic dataset.
arXiv Detail & Related papers (2023-09-16T08:46:59Z) - Multi-Modal Neural Radiance Field for Monocular Dense SLAM with a
Light-Weight ToF Sensor [58.305341034419136]
We present the first dense SLAM system with a monocular camera and a light-weight ToF sensor.
We propose a multi-modal implicit scene representation that supports rendering both the signals from the RGB camera and light-weight ToF sensor.
Experiments demonstrate that our system well exploits the signals of light-weight ToF sensors and achieves competitive results.
arXiv Detail & Related papers (2023-08-28T07:56:13Z) - GenISP: Neural ISP for Low-Light Machine Cognition [19.444297600977546]
In low-light conditions, object detectors using raw image data are more robust than detectors using image data processed by an ISP pipeline.
We propose a minimal neural ISP pipeline for machine cognition, named GenISP, that explicitly incorporates Color Space Transformation to a device-independent color space.
arXiv Detail & Related papers (2022-05-07T17:17:24Z) - Learning Enriched Illuminants for Cross and Single Sensor Color
Constancy [182.4997117953705]
We propose cross-sensor self-supervised training to train the network.
We train the network by randomly sampling the artificial illuminants in a sensor-independent manner.
Experiments show that our cross-sensor model and single-sensor model outperform other state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2022-03-21T15:45:35Z) - How to Calibrate Your Event Camera [58.80418612800161]
We propose a generic event camera calibration framework using image reconstruction.
We show that neural-network-based image reconstruction is well suited for the task of intrinsic and extrinsic calibration of event cameras.
arXiv Detail & Related papers (2021-05-26T07:06:58Z) - Multi-View Photometric Stereo: A Robust Solution and Benchmark Dataset
for Spatially Varying Isotropic Materials [65.95928593628128]
We present a method to capture both 3D shape and spatially varying reflectance with a multi-view photometric stereo technique.
Our algorithm is suitable for perspective cameras and nearby point light sources.
arXiv Detail & Related papers (2020-01-18T12:26:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.