EVI-SAM: Robust, Real-time, Tightly-coupled Event-Visual-Inertial State Estimation and 3D Dense Mapping
- URL: http://arxiv.org/abs/2312.11911v3
- Date: Thu, 23 May 2024 04:39:26 GMT
- Title: EVI-SAM: Robust, Real-time, Tightly-coupled Event-Visual-Inertial State Estimation and 3D Dense Mapping
- Authors: Weipeng Guan, Peiyu Chen, Huibin Zhao, Yu Wang, Peng Lu,
- Abstract summary: We propose EVI-SAM to tackle the problem of 6 DoF pose tracking and 3D reconstruction using monocular event camera.
A novel event-based hybrid tracking framework is designed to estimate the pose, leveraging the robustness of feature matching and the precision of direct alignment.
To the best of our knowledge, this is the first non-learning work to realize event-based dense mapping.
- Score: 5.154689086578339
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Event cameras are bio-inspired, motion-activated sensors that demonstrate substantial potential in handling challenging situations, such as motion blur and high-dynamic range. In this paper, we proposed EVI-SAM to tackle the problem of 6 DoF pose tracking and 3D reconstruction using monocular event camera. A novel event-based hybrid tracking framework is designed to estimate the pose, leveraging the robustness of feature matching and the precision of direct alignment. Specifically, we develop an event-based 2D-2D alignment to construct the photometric constraint, and tightly integrate it with the event-based reprojection constraint. The mapping module recovers the dense and colorful depth of the scene through the image-guided event-based mapping method. Subsequently, the appearance, texture, and surface mesh of the 3D scene can be reconstructed by fusing the dense depth map from multiple viewpoints using truncated signed distance function (TSDF) fusion. To the best of our knowledge, this is the first non-learning work to realize event-based dense mapping. Numerical evaluations are performed on both publicly available and self-collected datasets, which qualitatively and quantitatively demonstrate the superior performance of our method. Our EVI-SAM effectively balances accuracy and robustness while maintaining computational efficiency, showcasing superior pose tracking and dense mapping performance in challenging scenarios. Video Demo: https://youtu.be/Nn40U4e5Si8.
Related papers
- DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and
Depth from Monocular Videos [76.01906393673897]
We propose a self-supervised method to jointly learn 3D motion and depth from monocular videos.
Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion.
Our model delivers superior performance in all evaluated settings.
arXiv Detail & Related papers (2024-03-09T12:22:46Z) - Cross-Modal Semi-Dense 6-DoF Tracking of an Event Camera in Challenging
Conditions [29.608665442108727]
Event-based cameras are bio-inspired visual sensors that perform well in HDR conditions and have high temporal resolution.
The present work demonstrates the feasibility of purely event-based tracking if an alternative sensor is permitted for mapping.
The method relies on geometric 3D-2D registration of semi-dense maps and events, and achieves highly reliable and accurate cross-modal tracking results.
arXiv Detail & Related papers (2024-01-16T01:48:45Z) - DepthSSC: Depth-Spatial Alignment and Dynamic Voxel Resolution for
Monocular 3D Semantic Scene Completion [0.4662017507844857]
DepthSSC is an advanced method for semantic scene completion solely based on monocular cameras.
It mitigates spatial misalignment and distortion issues observed in prior methods.
It demonstrates its effectiveness in capturing intricate 3D structural details and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-11-28T01:47:51Z) - Rethinking Event-based Human Pose Estimation with 3D Event
Representations [26.592295349210787]
Event cameras offer a robust solution for navigating challenging contexts.
We introduce two 3D event representations: the Rasterized Event Point Cloud and the Decoupled Event Voxel.
Experiments on EV-3DPW demonstrate that the robustness of our proposed 3D representation methods compared to traditional RGB images and event frame techniques.
arXiv Detail & Related papers (2023-11-08T10:45:09Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and
Pose Annotations [64.95582364215548]
NAVI is a new dataset of category-agnostic image collections with high-quality 3D scans and per-image 2D-3D alignments.
These 2D-3D alignments allow us to extract accurate derivative annotations such as dense pixel correspondences, depth and segmentation maps.
arXiv Detail & Related papers (2023-06-15T13:11:30Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - DEVO: Depth-Event Camera Visual Odometry in Challenging Conditions [30.892930944644853]
We present a novel real-time visual odometry framework for a stereo setup of a depth and high-resolution event camera.
Our framework balances accuracy and robustness against computational efficiency towards strong performance in challenging scenarios.
arXiv Detail & Related papers (2022-02-05T13:46:47Z) - Attentive and Contrastive Learning for Joint Depth and Motion Field
Estimation [76.58256020932312]
Estimating the motion of the camera together with the 3D structure of the scene from a monocular vision system is a complex task.
We present a self-supervised learning framework for 3D object motion field estimation from monocular videos.
arXiv Detail & Related papers (2021-10-13T16:45:01Z) - Event-based Stereo Visual Odometry [42.77238738150496]
We present a solution to the problem of visual odometry from the data acquired by a stereo event-based camera rig.
We seek to maximize thetemporal consistency of stereo event-based data while using a simple and efficient representation.
arXiv Detail & Related papers (2020-07-30T15:53:28Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.