Related papers: Video based Object 6D Pose Estimation using Transformers

Video based Object 6D Pose Estimation using Transformers

URL: http://arxiv.org/abs/2210.13540v1
Date: Mon, 24 Oct 2022 18:45:53 GMT
Title: Video based Object 6D Pose Estimation using Transformers
Authors: Apoorva Beedu, Huda Alamri, Irfan Essa
Abstract summary: VideoPose is an end-to-end attention based modelling architecture that attends to previous frames in order to estimate 6D Object Poses in videos. Our architecture is able to capture and reason from long-range dependencies efficiently, thus iteratively refining over video sequences. Our approach is on par with the state-of-the-art Transformer methods, and performs significantly better relative to CNN based approaches.
Score: 6.951360830202521
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce a Transformer based 6D Object Pose Estimation framework VideoPose, comprising an end-to-end attention based modelling architecture, that attends to previous frames in order to estimate accurate 6D Object Poses in videos. Our approach leverages the temporal information from a video sequence for pose refinement, along with being computationally efficient and robust. Compared to existing methods, our architecture is able to capture and reason from long-range dependencies efficiently, thus iteratively refining over video sequences. Experimental evaluation on the YCB-Video dataset shows that our approach is on par with the state-of-the-art Transformer methods, and performs significantly better relative to CNN based approaches. Further, with a speed of 33 fps, it is also more efficient and therefore applicable to a variety of applications that require real-time object pose estimation. Training code and pretrained models are available at https://github.com/ApoorvaBeedu/VideoPose

Related papers

KRONC: Keypoint-based Robust Camera Optimization for 3D Car Reconstruction [58.04846444985808]
This paper introduces KRONC, a novel approach aimed at inferring view poses by leveraging prior knowledge about the object to reconstruct and its representation through semantic keypoints. With a focus on vehicle scenes, KRONC is able to estimate the position of the views as a solution to a light optimization problem targeting the convergence of keypoints' back-projections to a singular point.
arXiv Detail & Related papers (2024-09-09T08:08:05Z)
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking. Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z)
PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking [90.29143475328506]
We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. We animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos.
arXiv Detail & Related papers (2023-07-27T17:58:11Z)
YOLOPose V2: Understanding and Improving Transformer-based 6D Pose Estimation [36.067414358144816]
YOLOPose is a Transformer-based multi-object 6D pose estimation method. We employ a learnable orientation estimation module to predict the orientation from the keypoints. Our method is suitable for real-time applications and achieves results comparable to state-of-the-art methods.
arXiv Detail & Related papers (2023-07-21T12:53:54Z)
MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training. We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects. Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z)
Patch-based Object-centric Transformers for Efficient Video Generation [71.55412580325743]
We present Patch-based Object-centric Video Transformer (POVT), a novel region-based video generation architecture. We build upon prior work in video prediction via an autoregressive transformer over the discrete latent space of compressed videos. Due to better compressibility of object-centric representations, we can improve training efficiency by allowing the model to only access object information for longer horizon temporal information.
arXiv Detail & Related papers (2022-06-08T16:29:59Z)
FvOR: Robust Joint Shape and Pose Optimization for Few-view Object Reconstruction [37.81077373162092]
Reconstructing an accurate 3D object model from a few image observations remains a challenging problem in computer vision. We present FvOR, a learning-based object reconstruction method that predicts accurate 3D models given a few images with noisy input poses.
arXiv Detail & Related papers (2022-05-16T15:39:27Z)
VideoPose: Estimating 6D object pose from videos [14.210010379733017]
We introduce a simple yet effective algorithm that uses convolutional neural networks to directly estimate object poses from videos. Our proposed network takes a pre-trained 2D object detector as input, and aggregates visual features through a recurrent neural network to make predictions at each frame. Experimental evaluation on the YCB-Video dataset show that our approach is on par with the state-of-the-art algorithms.
arXiv Detail & Related papers (2021-11-20T20:57:45Z)
T6D-Direct: Transformers for Multi-Object 6D Pose Direct Regression [40.90172673391803]
T6D-Direct is a real-time single-stage direct method with a transformer-based architecture built on DETR to perform 6D multi-object pose direct estimation. Our method achieves the fastest inference time, and the pose estimation accuracy is comparable to state-of-the-art methods.
arXiv Detail & Related papers (2021-09-22T18:13:33Z)
Self-Attentive 3D Human Pose and Shape Estimation from Videos [82.63503361008607]
We present a video-based learning algorithm for 3D human pose and shape estimation. We exploit temporal information in videos and propose a self-attention module. We evaluate our method on the 3DPW, MPI-INF-3DHP, and Human3.6M datasets.
arXiv Detail & Related papers (2021-03-26T00:02:19Z)
PrimA6D: Rotational Primitive Reconstruction for Enhanced and Robust 6D Pose Estimation [11.873744190924599]
We introduce a rotational primitive prediction based 6D object pose estimation using a single image as an input. We leverage a Variational AutoEncoder (VAE) to learn this underlying primitive and its associated keypoints. When evaluated over public datasets, our method yields a notable improvement over LINEMOD, Occlusion LINEMOD, and the Y-induced dataset.
arXiv Detail & Related papers (2020-06-14T03:55:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.