Related papers: Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization

Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization

URL: http://arxiv.org/abs/2407.08023v1
Date: Wed, 10 Jul 2024 20:01:35 GMT
Title: Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization
Authors: Jinjie Mai, Abdullah Hamdi, Silvio Giancola, Chen Zhao, Bernard Ghanem,
Abstract summary: We propose a model ensemble strategy to improve the camera pose estimation part of the VQ3D task. The core idea is not only to do SfM for egocentric videos but also to do 2D-3D matching between existing 3D scans and 2D video frames. Our method achieves the best performance regarding the most important metric, the overall success rate.
Score: 64.08563002366812
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We built our pipeline EgoLoc-v1, mainly inspired by EgoLoc. We propose a model ensemble strategy to improve the camera pose estimation part of the VQ3D task, which has been proven to be essential in previous work. The core idea is not only to do SfM for egocentric videos but also to do 2D-3D matching between existing 3D scans and 2D video frames. In this way, we have a hybrid SfM and camera relocalization pipeline, which can provide us with more camera poses, leading to higher QwP and overall success rate. Our method achieves the best performance regarding the most important metric, the overall success rate. We surpass previous state-of-the-art, the competitive EgoLoc, by $1.5\%$. The code is available at \url{https://github.com/Wayne-Mai/egoloc_v1}.

Related papers

Bring Your Rear Cameras for Egocentric 3D Human Pose Estimation [67.9563319914377]
This paper investigates the usefulness of rear cameras in the head-mounted device (HMD) design for full-body tracking. We propose a new transformer-based method that refines 2D joint heatmap estimation with multi-view information and heatmap uncertainty. Our experiments show that the new camera configurations with back views provide superior support for 3D pose tracking.
arXiv Detail & Related papers (2025-03-14T17:59:54Z)
HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos [26.766489527823662]
HaWoR is a high-fidelity method for hand motion reconstruction in world coordinates from egocentric videos. To achieve precise camera trajectory estimation, we propose an adaptive egocentric SLAM framework. We demonstrate that HaWoR achieves state-of-the-art performance on both hand motion reconstruction and world-frame camera trajectory estimation.
arXiv Detail & Related papers (2025-01-06T12:29:33Z)
Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera [49.82535393220003]
Dyn-HaMR is the first approach to reconstruct 4D global hand motion from monocular videos recorded by dynamic cameras in the wild. We show that our approach significantly outperforms state-of-the-art methods in terms of 4D global mesh recovery. This establishes a new benchmark for hand motion reconstruction from monocular video with moving cameras.
arXiv Detail & Related papers (2024-12-17T12:43:10Z)
Generating 3D-Consistent Videos from Unposed Internet Photos [68.944029293283]
We train a scalable, 3D-aware video model without any 3D annotations such as camera parameters. Our results suggest that we can scale up scene-level 3D learning using only 2D data such as videos and multiview internet photos.
arXiv Detail & Related papers (2024-11-20T18:58:31Z)
UniHOI: Learning Fast, Dense and Generalizable 4D Reconstruction for Egocentric Hand Object Interaction Videos [25.41337525728398]
We introduce UniHOI, a model that unifies the estimation of all variables necessary for dense 4D reconstruction. UniHOI is the first approach to offer fast, dense, and general monocular egocentric HOI scene reconstruction in the presence of motion.
arXiv Detail & Related papers (2024-11-14T02:57:11Z)
GFlow: Recovering 4D World from Monocular Video [58.63051670458107]
We introduce GFlow, a framework that lifts a video (3D) to a 4D explicit representation, entailing a flow of Gaussian splatting through space and time. GFlow first clusters the scene into still and moving parts, then applies a sequential optimization process. GFlow transcends the boundaries of mere 4D reconstruction.
arXiv Detail & Related papers (2024-05-28T17:59:22Z)
SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction [77.15924044466976]
We propose SelfOcc to explore a self-supervised way to learn 3D occupancy using only video sequences. We first transform the images into the 3D space (e.g., bird's eye view) to obtain 3D representation of the scene. We can then render 2D images of previous and future frames as self-supervision signals to learn the 3D representations.
arXiv Detail & Related papers (2023-11-21T17:59:14Z)
EgoCOL: Egocentric Camera pose estimation for Open-world 3D object Localization @Ego4D challenge 2023 [9.202585784962276]
We present EgoCOL, an egocentric camera pose estimation method for open-world 3D object localization. Our method leverages sparse camera pose reconstructions in a two-fold manner, video and scan independently, to estimate the camera pose of egocentric frames in 3D renders with high recall and precision.
arXiv Detail & Related papers (2023-06-29T00:17:23Z)
EgoVSR: Towards High-Quality Egocentric Video Super-Resolution [23.50915512118989]
EgoVSR is a Video Super-Resolution framework specifically designed for egocentric videos. We explicitly tackle motion blurs in egocentric videos using a Dual Branch Deblur Network (DB$2$Net) in the VSR framework. An online motion blur synthesis model for common VSR training data is proposed to simulate motion blurs as in egocentric videos.
arXiv Detail & Related papers (2023-05-24T04:25:51Z)
EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries [68.75400888770793]
We formalize a pipeline that better entangles 3D multiview geometry with 2D object retrieval from egocentric videos. Specifically, our approach achieves an overall success rate of up to 87.12%, which sets a new state-of-the-art result in the VQ3D task.
arXiv Detail & Related papers (2022-12-14T01:28:12Z)
Estimating more camera poses for ego-centric videos is essential for VQ3D [70.78927854445615]
We develop a new pipeline for the challenging egocentric video camera pose estimation problem in our work. We get the top-1 overall success rate of 25.8% on VQ3D leaderboard, which is two times better than the 8.7% reported by the baseline.
arXiv Detail & Related papers (2022-11-18T15:16:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.