Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization
- URL: http://arxiv.org/abs/2407.08023v1
- Date: Wed, 10 Jul 2024 20:01:35 GMT
- Title: Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization
- Authors: Jinjie Mai, Abdullah Hamdi, Silvio Giancola, Chen Zhao, Bernard Ghanem,
- Abstract summary: We propose a model ensemble strategy to improve the camera pose estimation part of the VQ3D task.
The core idea is not only to do SfM for egocentric videos but also to do 2D-3D matching between existing 3D scans and 2D video frames.
Our method achieves the best performance regarding the most important metric, the overall success rate.
- Score: 64.08563002366812
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We built our pipeline EgoLoc-v1, mainly inspired by EgoLoc. We propose a model ensemble strategy to improve the camera pose estimation part of the VQ3D task, which has been proven to be essential in previous work. The core idea is not only to do SfM for egocentric videos but also to do 2D-3D matching between existing 3D scans and 2D video frames. In this way, we have a hybrid SfM and camera relocalization pipeline, which can provide us with more camera poses, leading to higher QwP and overall success rate. Our method achieves the best performance regarding the most important metric, the overall success rate. We surpass previous state-of-the-art, the competitive EgoLoc, by $1.5\%$. The code is available at \url{https://github.com/Wayne-Mai/egoloc_v1}.
Related papers
- CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation [117.16677556874278]
We introduce CamCo, which allows fine-grained Camera pose Control for image-to-video generation.
To enhance 3D consistency in the videos produced, we integrate an epipolar attention module in each attention block.
Our experiments show that CamCo significantly improves 3D consistency and camera control capabilities compared to previous models.
arXiv Detail & Related papers (2024-06-04T17:27:19Z) - GFlow: Recovering 4D World from Monocular Video [58.63051670458107]
We introduce GFlow, a framework that lifts a video (3D) to a 4D explicit representation, entailing a flow of Gaussian splatting through space and time.
GFlow first clusters the scene into still and moving parts, then applies a sequential optimization process.
GFlow transcends the boundaries of mere 4D reconstruction.
arXiv Detail & Related papers (2024-05-28T17:59:22Z) - Visual Geometry Grounded Deep Structure From Motion [20.203320509695306]
We propose a new deep pipeline VGGSfM, where each component is fully differentiable and can be trained in an end-to-end manner.
First, we build on recent advances in deep 2D point tracking to extract reliable pixel-accurate tracks, which eliminates the need for chaining pairwise matches.
We attain state-of-the-art performance on three popular datasets, CO3D, IMC Phototourism, and ETH3D.
arXiv Detail & Related papers (2023-12-07T18:59:52Z) - SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction [77.15924044466976]
We propose SelfOcc to explore a self-supervised way to learn 3D occupancy using only video sequences.
We first transform the images into the 3D space (e.g., bird's eye view) to obtain 3D representation of the scene.
We can then render 2D images of previous and future frames as self-supervision signals to learn the 3D representations.
arXiv Detail & Related papers (2023-11-21T17:59:14Z) - EgoCOL: Egocentric Camera pose estimation for Open-world 3D object
Localization @Ego4D challenge 2023 [9.202585784962276]
We present EgoCOL, an egocentric camera pose estimation method for open-world 3D object localization.
Our method leverages sparse camera pose reconstructions in a two-fold manner, video and scan independently, to estimate the camera pose of egocentric frames in 3D renders with high recall and precision.
arXiv Detail & Related papers (2023-06-29T00:17:23Z) - EgoVSR: Towards High-Quality Egocentric Video Super-Resolution [23.50915512118989]
EgoVSR is a Video Super-Resolution framework specifically designed for egocentric videos.
We explicitly tackle motion blurs in egocentric videos using a Dual Branch Deblur Network (DB$2$Net) in the VSR framework.
An online motion blur synthesis model for common VSR training data is proposed to simulate motion blurs as in egocentric videos.
arXiv Detail & Related papers (2023-05-24T04:25:51Z) - EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with
Visual Queries [68.75400888770793]
We formalize a pipeline that better entangles 3D multiview geometry with 2D object retrieval from egocentric videos.
Specifically, our approach achieves an overall success rate of up to 87.12%, which sets a new state-of-the-art result in the VQ3D task.
arXiv Detail & Related papers (2022-12-14T01:28:12Z) - Estimating more camera poses for ego-centric videos is essential for
VQ3D [70.78927854445615]
We develop a new pipeline for the challenging egocentric video camera pose estimation problem in our work.
We get the top-1 overall success rate of 25.8% on VQ3D leaderboard, which is two times better than the 8.7% reported by the baseline.
arXiv Detail & Related papers (2022-11-18T15:16:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.