A Novel Convolution and Attention Mechanism-based Model for 6D Object Pose Estimation
- URL: http://arxiv.org/abs/2501.01993v1
- Date: Tue, 31 Dec 2024 18:47:54 GMT
- Title: A Novel Convolution and Attention Mechanism-based Model for 6D Object Pose Estimation
- Authors: Alexander Du, Yingwu Zhu,
- Abstract summary: Esting 6D object poses from RGB images is challenging because the lack of depth information requires inferring a three dimensional structure from 2D projections.
Traditional methods often rely on deep learning with grid based data structures but struggle to capture complex dependencies among extracted features.
We introduce a graph based representation derived directly from images, where temporal features of each pixel serve as nodes, and relationships between them are defined through node connectivity and spatial interactions.
- Score: 49.1574468325115
- License:
- Abstract: Estimating 6D object poses from RGB images is challenging because the lack of depth information requires inferring a three dimensional structure from 2D projections. Traditional methods often rely on deep learning with grid based data structures but struggle to capture complex dependencies among extracted features. To overcome this, we introduce a graph based representation derived directly from images, where spatial temporal features of each pixel serve as nodes, and relationships between them are defined through node connectivity and spatial interactions. We also employ feature selection mechanisms that use spatial attention and self attention distillation, along with a Legendre convolution layer leveraging the orthogonality of Legendre polynomials for numerical stability. Experiments on the LINEMOD, Occluded LINEMOD, and YCB Video datasets demonstrate that our method outperforms nine existing approaches and achieves state of the art benchmark in object pose estimation.
Related papers
- Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation [14.469317161361202]
We propose a 6D object pose estimation method that can be trained with pure RGB images without any auxiliary information.
We evaluate our method on three challenging datasets and demonstrate that it outperforms state-of-the-art self-supervised methods significantly.
arXiv Detail & Related papers (2023-08-19T13:52:18Z) - Hierarchical Graph Neural Networks for Proprioceptive 6D Pose Estimation
of In-hand Objects [1.8263882169310044]
We introduce a hierarchical graph neural network architecture for combining multimodal (vision and touch) data.
We also introduce a hierarchical message passing operation that flows the information within and across modalities to learn a graph-based object representation.
arXiv Detail & Related papers (2023-06-28T01:18:53Z) - DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving.
We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z) - Learning Stereopsis from Geometric Synthesis for 6D Object Pose
Estimation [11.999630902627864]
Current monocular-based 6D object pose estimation methods generally achieve less competitive results than RGBD-based methods.
This paper proposes a 3D geometric volume based pose estimation method with a short baseline two-view setting.
Experiments show that our method outperforms state-of-the-art monocular-based methods, and is robust in different objects and scenes.
arXiv Detail & Related papers (2021-09-25T02:55:05Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based
Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution.
A natural remedy is to utilize the 3D voxelization and 3D convolution network.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z) - FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose
Estimation with Decoupled Rotation Mechanism [49.89268018642999]
We propose a fast shape-based network (FS-Net) with efficient category-level feature extraction for 6D pose estimation.
The proposed method achieves state-of-the-art performance in both category- and instance-level 6D object pose estimation.
arXiv Detail & Related papers (2021-03-12T03:07:24Z) - Disentangling and Unifying Graph Convolutions for Skeleton-Based Action
Recognition [79.33539539956186]
We propose a simple method to disentangle multi-scale graph convolutions and a unified spatial-temporal graph convolutional operator named G3D.
By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets.
arXiv Detail & Related papers (2020-03-31T11:28:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.