Estimating more camera poses for ego-centric videos is essential for
VQ3D
- URL: http://arxiv.org/abs/2211.10284v1
- Date: Fri, 18 Nov 2022 15:16:49 GMT
- Title: Estimating more camera poses for ego-centric videos is essential for
VQ3D
- Authors: Jinjie Mai, Chen Zhao, Abdullah Hamdi, Silvio Giancola, Bernard Ghanem
- Abstract summary: We develop a new pipeline for the challenging egocentric video camera pose estimation problem in our work.
We get the top-1 overall success rate of 25.8% on VQ3D leaderboard, which is two times better than the 8.7% reported by the baseline.
- Score: 70.78927854445615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual queries 3D localization (VQ3D) is a task in the Ego4D Episodic Memory
Benchmark. Given an egocentric video, the goal is to answer queries of the form
"Where did I last see object X?", where the query object X is specified as a
static image, and the answer should be a 3D displacement vector pointing to
object X. However, current techniques use naive ways to estimate the camera
poses of video frames, resulting in a low query with pose (QwP) ratio, thus a
poor overall success rate. We design a new pipeline for the challenging
egocentric video camera pose estimation problem in our work. Moreover, we
revisit the current VQ3D framework and optimize it in terms of performance and
efficiency. As a result, we get the top-1 overall success rate of 25.8% on VQ3D
leaderboard, which is two times better than the 8.7% reported by the baseline.
Related papers
- Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization [64.08563002366812]
We propose a model ensemble strategy to improve the camera pose estimation part of the VQ3D task.
The core idea is not only to do SfM for egocentric videos but also to do 2D-3D matching between existing 3D scans and 2D video frames.
Our method achieves the best performance regarding the most important metric, the overall success rate.
arXiv Detail & Related papers (2024-07-10T20:01:35Z) - 3D-Aware Visual Question Answering about Parts, Poses and Occlusions [20.83938624671415]
We introduce the task of 3D-aware VQA, which focuses on challenging questions that require a compositional reasoning over the 3D structure of visual scenes.
We propose PO3D-VQA, a 3D-aware VQA model that marries two powerful ideas: probabilistic neural symbolic program execution for reasoning and deep neural networks with 3D generative representations of objects for robust visual recognition.
Our experimental results show our model PO3D-VQA outperforms existing methods significantly, but we still observe a significant performance gap compared to 2D VQA benchmarks.
arXiv Detail & Related papers (2023-10-27T06:15:30Z) - EgoCOL: Egocentric Camera pose estimation for Open-world 3D object
Localization @Ego4D challenge 2023 [9.202585784962276]
We present EgoCOL, an egocentric camera pose estimation method for open-world 3D object localization.
Our method leverages sparse camera pose reconstructions in a two-fold manner, video and scan independently, to estimate the camera pose of egocentric frames in 3D renders with high recall and precision.
arXiv Detail & Related papers (2023-06-29T00:17:23Z) - EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with
Visual Queries [68.75400888770793]
We formalize a pipeline that better entangles 3D multiview geometry with 2D object retrieval from egocentric videos.
Specifically, our approach achieves an overall success rate of up to 87.12%, which sets a new state-of-the-art result in the VQ3D task.
arXiv Detail & Related papers (2022-12-14T01:28:12Z) - Negative Frames Matter in Egocentric Visual Query 2D Localization [119.23191388798921]
Recently released Ego4D dataset and benchmark significantly scales and diversifies first-person visual perception data.
Visual Queries 2D localization task aims to retrieve objects appeared in the past from the recording in the first-person view.
Our study is based on the three-stage baseline introduced in the Episodic Memory benchmark.
arXiv Detail & Related papers (2022-08-03T09:54:51Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - Coupled Iterative Refinement for 6D Multi-Object Pose Estimation [64.7198752089041]
Given a set of known 3D objects and an RGB or RGB-D input image, we detect and estimate the 6D pose of each object.
Our approach iteratively refines both pose and correspondence in a tightly coupled manner, allowing us to dynamically remove outliers to improve accuracy.
arXiv Detail & Related papers (2022-04-26T18:00:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.