Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization
- URL: http://arxiv.org/abs/2407.08023v1
- Date: Wed, 10 Jul 2024 20:01:35 GMT
- Title: Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization
- Authors: Jinjie Mai, Abdullah Hamdi, Silvio Giancola, Chen Zhao, Bernard Ghanem,
- Abstract summary: We propose a model ensemble strategy to improve the camera pose estimation part of the VQ3D task.
The core idea is not only to do SfM for egocentric videos but also to do 2D-3D matching between existing 3D scans and 2D video frames.
Our method achieves the best performance regarding the most important metric, the overall success rate.
- Score: 64.08563002366812
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We built our pipeline EgoLoc-v1, mainly inspired by EgoLoc. We propose a model ensemble strategy to improve the camera pose estimation part of the VQ3D task, which has been proven to be essential in previous work. The core idea is not only to do SfM for egocentric videos but also to do 2D-3D matching between existing 3D scans and 2D video frames. In this way, we have a hybrid SfM and camera relocalization pipeline, which can provide us with more camera poses, leading to higher QwP and overall success rate. Our method achieves the best performance regarding the most important metric, the overall success rate. We surpass previous state-of-the-art, the competitive EgoLoc, by $1.5\%$. The code is available at \url{https://github.com/Wayne-Mai/egoloc_v1}.
Related papers
- Generating 3D-Consistent Videos from Unposed Internet Photos [68.944029293283]
We train a scalable, 3D-aware video model without any 3D annotations such as camera parameters.
Our results suggest that we can scale up scene-level 3D learning using only 2D data such as videos and multiview internet photos.
arXiv Detail & Related papers (2024-11-20T18:58:31Z) - UniHOI: Learning Fast, Dense and Generalizable 4D Reconstruction for Egocentric Hand Object Interaction Videos [25.41337525728398]
We introduce UniHOI, a model that unifies the estimation of all variables necessary for dense 4D reconstruction.
UniHOI is the first approach to offer fast, dense, and general monocular egocentric HOI scene reconstruction in the presence of motion.
arXiv Detail & Related papers (2024-11-14T02:57:11Z) - GFlow: Recovering 4D World from Monocular Video [58.63051670458107]
We introduce GFlow, a framework that lifts a video (3D) to a 4D explicit representation, entailing a flow of Gaussian splatting through space and time.
GFlow first clusters the scene into still and moving parts, then applies a sequential optimization process.
GFlow transcends the boundaries of mere 4D reconstruction.
arXiv Detail & Related papers (2024-05-28T17:59:22Z) - SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction [77.15924044466976]
We propose SelfOcc to explore a self-supervised way to learn 3D occupancy using only video sequences.
We first transform the images into the 3D space (e.g., bird's eye view) to obtain 3D representation of the scene.
We can then render 2D images of previous and future frames as self-supervision signals to learn the 3D representations.
arXiv Detail & Related papers (2023-11-21T17:59:14Z) - EgoCOL: Egocentric Camera pose estimation for Open-world 3D object
Localization @Ego4D challenge 2023 [9.202585784962276]
We present EgoCOL, an egocentric camera pose estimation method for open-world 3D object localization.
Our method leverages sparse camera pose reconstructions in a two-fold manner, video and scan independently, to estimate the camera pose of egocentric frames in 3D renders with high recall and precision.
arXiv Detail & Related papers (2023-06-29T00:17:23Z) - EgoVSR: Towards High-Quality Egocentric Video Super-Resolution [23.50915512118989]
EgoVSR is a Video Super-Resolution framework specifically designed for egocentric videos.
We explicitly tackle motion blurs in egocentric videos using a Dual Branch Deblur Network (DB$2$Net) in the VSR framework.
An online motion blur synthesis model for common VSR training data is proposed to simulate motion blurs as in egocentric videos.
arXiv Detail & Related papers (2023-05-24T04:25:51Z) - EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with
Visual Queries [68.75400888770793]
We formalize a pipeline that better entangles 3D multiview geometry with 2D object retrieval from egocentric videos.
Specifically, our approach achieves an overall success rate of up to 87.12%, which sets a new state-of-the-art result in the VQ3D task.
arXiv Detail & Related papers (2022-12-14T01:28:12Z) - Estimating more camera poses for ego-centric videos is essential for
VQ3D [70.78927854445615]
We develop a new pipeline for the challenging egocentric video camera pose estimation problem in our work.
We get the top-1 overall success rate of 25.8% on VQ3D leaderboard, which is two times better than the 8.7% reported by the baseline.
arXiv Detail & Related papers (2022-11-18T15:16:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.