FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation
- URL: http://arxiv.org/abs/2103.02242v1
- Date: Wed, 3 Mar 2021 08:07:29 GMT
- Title: FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation
- Authors: Yisheng He and Haibin Huang and Haoqiang Fan and Qifeng Chen and Jian
Sun
- Abstract summary: We present FFB6D, a Bidirectional fusion network designed for 6D pose estimation from a single RGBD image.
We learn to combine appearance and geometry information for representation learning as well as output representation selection.
Our method outperforms the state-of-the-art by large margins on several benchmarks.
- Score: 54.666329929930455
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we present FFB6D, a Full Flow Bidirectional fusion network
designed for 6D pose estimation from a single RGBD image. Our key insight is
that appearance information in the RGB image and geometry information from the
depth image are two complementary data sources, and it still remains unknown
how to fully leverage them. Towards this end, we propose FFB6D, which learns to
combine appearance and geometry information for representation learning as well
as output representation selection. Specifically, at the representation
learning stage, we build bidirectional fusion modules in the full flow of the
two networks, where fusion is applied to each encoding and decoding layer. In
this way, the two networks can leverage local and global complementary
information from the other one to obtain better representations. Moreover, at
the output representation stage, we designed a simple but effective 3D
keypoints selection algorithm considering the texture and geometry information
of objects, which simplifies keypoint localization for precise pose estimation.
Experimental results show that our method outperforms the state-of-the-art by
large margins on several benchmarks. Code and video are available at
\url{https://github.com/ethnhe/FFB6D.git}.
Related papers
- MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images [57.71600854525037]
We propose a Fuse-Describe-Match strategy for 6D pose estimation from RGB-D images.
MatchU is a generic approach that fuses 2D texture and 3D geometric cues for 6D pose prediction of unseen objects.
arXiv Detail & Related papers (2024-03-03T14:01:03Z) - Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation [14.469317161361202]
We propose a 6D object pose estimation method that can be trained with pure RGB images without any auxiliary information.
We evaluate our method on three challenging datasets and demonstrate that it outperforms state-of-the-art self-supervised methods significantly.
arXiv Detail & Related papers (2023-08-19T13:52:18Z) - Pyramid Deep Fusion Network for Two-Hand Reconstruction from RGB-D Images [11.100398985633754]
We propose an end-to-end framework for recovering dense meshes for both hands.
Our framework employs ResNet50 and PointNet++ to derive features from RGB and point cloud.
We also introduce a novel pyramid deep fusion network (PDFNet) to aggregate features at different scales.
arXiv Detail & Related papers (2023-07-12T09:33:21Z) - GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs [49.55919802779889]
We propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion.
In this work, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning.
Our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps.
arXiv Detail & Related papers (2022-10-19T17:56:03Z) - Uni6D: A Unified CNN Framework without Projection Breakdown for 6D Pose
Estimation [21.424035166174352]
State-of-the-art approaches typically use different backbones to extract features for RGB and depth images.
We find that the essential reason for using two independent backbones is the "projection breakdown" problem.
We propose a simple yet effective method denoted as Uni6D that explicitly takes the extra UV data along with RGB-D images as input.
arXiv Detail & Related papers (2022-03-28T07:05:27Z) - 6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-based
Instance Representation Learning [0.0]
6D-ViT is a transformer-based instance representation learning network.
It is suitable for highly accurate category-level object pose estimation on RGB-D images.
arXiv Detail & Related papers (2021-10-10T13:34:16Z) - Similarity-Aware Fusion Network for 3D Semantic Segmentation [87.51314162700315]
We propose a similarity-aware fusion network (SAFNet) to adaptively fuse 2D images and 3D point clouds for 3D semantic segmentation.
We employ a late fusion strategy where we first learn the geometric and contextual similarities between the input and back-projected (from 2D pixels) point clouds.
We show that SAFNet significantly outperforms existing state-of-the-art fusion-based approaches across various data integrity.
arXiv Detail & Related papers (2021-07-04T09:28:18Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.