EgoCOL: Egocentric Camera pose estimation for Open-world 3D object
Localization @Ego4D challenge 2023
- URL: http://arxiv.org/abs/2306.16606v1
- Date: Thu, 29 Jun 2023 00:17:23 GMT
- Title: EgoCOL: Egocentric Camera pose estimation for Open-world 3D object
Localization @Ego4D challenge 2023
- Authors: Cristhian Forigua, Maria Escobar, Jordi Pont-Tuset, Kevis-Kokitsi
Maninis and Pablo Arbel\'aez
- Abstract summary: We present EgoCOL, an egocentric camera pose estimation method for open-world 3D object localization.
Our method leverages sparse camera pose reconstructions in a two-fold manner, video and scan independently, to estimate the camera pose of egocentric frames in 3D renders with high recall and precision.
- Score: 9.202585784962276
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present EgoCOL, an egocentric camera pose estimation method for open-world
3D object localization. Our method leverages sparse camera pose reconstructions
in a two-fold manner, video and scan independently, to estimate the camera pose
of egocentric frames in 3D renders with high recall and precision. We
extensively evaluate our method on the Visual Query (VQ) 3D object localization
Ego4D benchmark. EgoCOL can estimate 62% and 59% more camera poses than the
Ego4D baseline in the Ego4D Visual Queries 3D Localization challenge at CVPR
2023 in the val and test sets, respectively. Our code is publicly available at
https://github.com/BCV-Uniandes/EgoCOL
Related papers
- MPL: Lifting 3D Human Pose from Multi-view 2D Poses [75.26416079541723]
We propose combining 2D pose estimation, for which large and rich training datasets exist, and 2D-to-3D pose lifting, using a transformer-based network.
Our experiments demonstrate decreases up to 45% in MPJPE errors compared to the 3D pose obtained by triangulating the 2D poses.
arXiv Detail & Related papers (2024-08-20T12:55:14Z) - Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization [64.08563002366812]
We propose a model ensemble strategy to improve the camera pose estimation part of the VQ3D task.
The core idea is not only to do SfM for egocentric videos but also to do 2D-3D matching between existing 3D scans and 2D video frames.
Our method achieves the best performance regarding the most important metric, the overall success rate.
arXiv Detail & Related papers (2024-07-10T20:01:35Z) - SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation [2.929565541219051]
We present a new self-supervised approach, SelfPose3d, for estimating 3d poses of multiple persons from multiple camera views.
Unlike current state-of-the-art fully-supervised methods, our approach does not require any 2d or 3d ground-truth poses.
Our experiments and analysis on three public benchmark datasets, including Panoptic, Shelf, and Campus, show the effectiveness of our approach.
arXiv Detail & Related papers (2024-04-02T15:34:52Z) - EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with
Visual Queries [68.75400888770793]
We formalize a pipeline that better entangles 3D multiview geometry with 2D object retrieval from egocentric videos.
Specifically, our approach achieves an overall success rate of up to 87.12%, which sets a new state-of-the-art result in the VQ3D task.
arXiv Detail & Related papers (2022-12-14T01:28:12Z) - Estimating more camera poses for ego-centric videos is essential for
VQ3D [70.78927854445615]
We develop a new pipeline for the challenging egocentric video camera pose estimation problem in our work.
We get the top-1 overall success rate of 25.8% on VQ3D leaderboard, which is two times better than the 8.7% reported by the baseline.
arXiv Detail & Related papers (2022-11-18T15:16:49Z) - Towards Generalization of 3D Human Pose Estimation In The Wild [73.19542580408971]
3DBodyTex.Pose is a dataset that addresses the task of 3D human pose estimation in-the-wild.
3DBodyTex.Pose offers high quality and rich data containing 405 different real subjects in various clothing and poses, and 81k image samples with ground-truth 2D and 3D pose annotations.
arXiv Detail & Related papers (2020-04-21T13:31:58Z) - VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild
Environment [80.77351380961264]
We present an approach to estimate 3D poses of multiple people from multiple camera views.
We present an end-to-end solution which operates in the $3$D space, therefore avoids making incorrect decisions in the 2D space.
We propose Pose Regression Network (PRN) to estimate a detailed 3D pose for each proposal.
arXiv Detail & Related papers (2020-04-13T23:50:01Z) - Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS [13.191601826570786]
We present a novel solution for multi-human 3D pose estimation from multiple calibrated camera views.
It takes 2D poses in different camera coordinates as inputs and aims for the accurate 3D poses in the global coordinate.
We propose a new large-scale multi-human dataset with 12 to 28 camera views.
arXiv Detail & Related papers (2020-03-09T08:54:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.