VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning
Decoupled Rotations on the Spherical Representations
- URL: http://arxiv.org/abs/2308.09916v1
- Date: Sat, 19 Aug 2023 05:47:53 GMT
- Title: VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning
Decoupled Rotations on the Spherical Representations
- Authors: Jiehong Lin and Zewei Wei and Yabin Zhang and Kui Jia
- Abstract summary: We propose a novel rotation estimation network, termed as VI-Net, to make the task easier.
To process the spherical signals, a Spherical Feature Pyramid Network is constructed based on a novel design of SPAtial Spherical Convolution.
Experiments on the benchmarking datasets confirm the efficacy of our method, which outperforms the existing ones with a large margin in the regime of high precision.
- Score: 55.25238503204253
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rotation estimation of high precision from an RGB-D object observation is a
huge challenge in 6D object pose estimation, due to the difficulty of learning
in the non-linear space of SO(3). In this paper, we propose a novel rotation
estimation network, termed as VI-Net, to make the task easier by decoupling the
rotation as the combination of a viewpoint rotation and an in-plane rotation.
More specifically, VI-Net bases the feature learning on the sphere with two
individual branches for the estimates of two factorized rotations, where a
V-Branch is employed to learn the viewpoint rotation via binary classification
on the spherical signals, while another I-Branch is used to estimate the
in-plane rotation by transforming the signals to view from the zenith
direction. To process the spherical signals, a Spherical Feature Pyramid
Network is constructed based on a novel design of SPAtial Spherical Convolution
(SPA-SConv), which settles the boundary problem of spherical signals via
feature padding and realizesviewpoint-equivariant feature extraction by
symmetric convolutional operations. We apply the proposed VI-Net to the
challenging task of category-level 6D object pose estimation for predicting the
poses of unknown objects without available CAD models; experiments on the
benchmarking datasets confirm the efficacy of our method, which outperforms the
existing ones with a large margin in the regime of high precision.
Related papers
- 3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction [50.07071392673984]
Existing methods learn 3D rotations parametrized in the spatial domain using angles or quaternions.
We propose a frequency-domain approach that directly predicts Wigner-D coefficients for 3D rotation regression.
Our method achieves state-of-the-art results on benchmarks such as ModelNet10-SO(3) and PASCAL3D+.
arXiv Detail & Related papers (2024-11-01T12:50:38Z) - Category-Level 6D Object Pose Estimation with Flexible Vector-Based
Rotation Representation [51.67545893892129]
We propose a novel 3D graph convolution based pipeline for category-level 6D pose and size estimation from monocular RGB-D images.
We first design an orientation-aware autoencoder with 3D graph convolution for latent feature learning.
Then, to efficiently decode the rotation information from the latent feature, we design a novel flexible vector-based decomposable rotation representation.
arXiv Detail & Related papers (2022-12-09T02:13:43Z) - FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose
Estimation with Decoupled Rotation Mechanism [49.89268018642999]
We propose a fast shape-based network (FS-Net) with efficient category-level feature extraction for 6D pose estimation.
The proposed method achieves state-of-the-art performance in both category- and instance-level 6D object pose estimation.
arXiv Detail & Related papers (2021-03-12T03:07:24Z) - 3D Point-to-Keypoint Voting Network for 6D Pose Estimation [8.801404171357916]
We propose a framework for 6D pose estimation from RGB-D data based on spatial structure characteristics of 3D keypoints.
The proposed method is verified on two benchmark datasets, LINEMOD and OCCLUSION LINEMOD.
arXiv Detail & Related papers (2020-12-22T11:43:15Z) - Rotation-Invariant Local-to-Global Representation Learning for 3D Point
Cloud [42.86112554931754]
We propose a local-to-global representation learning algorithm for 3D point cloud data.
Our model takes advantage of multi-level abstraction based on graph convolutional neural networks.
The proposed algorithm presents the state-of-the-art performance on the rotation-augmented 3D object recognition and segmentation benchmarks.
arXiv Detail & Related papers (2020-10-07T10:30:20Z) - A Smooth Representation of Belief over SO(3) for Deep Rotation Learning
with Uncertainty [33.627068152037815]
We present a novel symmetric matrix representation of the 3D rotation group, SO(3), with two important properties that make it particularly suitable for learned models.
We empirically validate the benefits of our formulation by training deep neural rotation regressors on two data modalities.
This capability is key for safety-critical applications where detecting novel inputs can prevent catastrophic failure of learned models.
arXiv Detail & Related papers (2020-06-01T15:57:45Z) - Cylindrical Convolutional Networks for Joint Object Detection and
Viewpoint Estimation [76.21696417873311]
We introduce a learnable module, cylindrical convolutional networks (CCNs), that exploit cylindrical representation of a convolutional kernel defined in the 3D space.
CCNs extract a view-specific feature through a view-specific convolutional kernel to predict object category scores at each viewpoint.
Our experiments demonstrate the effectiveness of the cylindrical convolutional networks on joint object detection and viewpoint estimation.
arXiv Detail & Related papers (2020-03-25T10:24:58Z) - Robust 6D Object Pose Estimation by Learning RGB-D Features [59.580366107770764]
We propose a novel discrete-continuous formulation for rotation regression to resolve this local-optimum problem.
We uniformly sample rotation anchors in SO(3), and predict a constrained deviation from each anchor to the target, as well as uncertainty scores for selecting the best prediction.
Experiments on two benchmarks: LINEMOD and YCB-Video, show that the proposed method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2020-02-29T06:24:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.