Fusing Visual Appearance and Geometry for Multi-modality 6DoF Object
Tracking
- URL: http://arxiv.org/abs/2302.11458v1
- Date: Wed, 22 Feb 2023 15:53:00 GMT
- Title: Fusing Visual Appearance and Geometry for Multi-modality 6DoF Object
Tracking
- Authors: Manuel Stoiber, Mariam Elsayed, Anne E. Reichert, Florian Steidle,
Dongheui Lee, Rudolph Triebel
- Abstract summary: We develop a multi-modality tracker that fuses information from visual appearance and geometry to estimate object poses.
The algorithm extends our previous method ICG, which uses geometry, to additionally consider surface appearance.
- Score: 21.74515335906769
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In many applications of advanced robotic manipulation, six degrees of freedom
(6DoF) object pose estimates are continuously required. In this work, we
develop a multi-modality tracker that fuses information from visual appearance
and geometry to estimate object poses. The algorithm extends our previous
method ICG, which uses geometry, to additionally consider surface appearance.
In general, object surfaces contain local characteristics from text, graphics,
and patterns, as well as global differences from distinct materials and colors.
To incorporate this visual information, two modalities are developed. For local
characteristics, keypoint features are used to minimize distances between
points from keyframes and the current image. For global differences, a novel
region approach is developed that considers multiple regions on the object
surface. In addition, it allows the modeling of external geometries.
Experiments on the YCB-Video and OPT datasets demonstrate that our approach
ICG+ performs best on both datasets, outperforming both conventional and deep
learning-based methods. At the same time, the algorithm is highly efficient and
runs at more than 300 Hz. The source code of our tracker is publicly available.
Related papers
- GLCONet: Learning Multi-source Perception Representation for Camouflaged Object Detection [23.872633359324098]
We propose a novel Global-Local Collaborative Optimization Network, called GLCONet.
In this paper, we first design a collaborative optimization strategy to simultaneously model the local details and global long-range relationships.
Experiments demonstrate that the proposed GLCONet method with different backbones can effectively activate potentially significant pixels in an image.
arXiv Detail & Related papers (2024-09-15T02:26:17Z) - GeoGen: Geometry-Aware Generative Modeling via Signed Distance Functions [22.077366472693395]
We introduce a new generative approach for synthesizing 3D geometry and images from single-view collections.
By employing volumetric rendering using neural radiance fields, they inherit a key limitation: the generated geometry is noisy and unconstrained.
We propose GeoGen, a new SDF-based 3D generative model trained in an end-to-end manner.
arXiv Detail & Related papers (2024-06-06T17:00:10Z) - Towards Unified 3D Object Detection via Algorithm and Data Unification [70.27631528933482]
We build the first unified multi-modal 3D object detection benchmark MM- Omni3D and extend the aforementioned monocular detector to its multi-modal version.
We name the designed monocular and multi-modal detectors as UniMODE and MM-UniMODE, respectively.
arXiv Detail & Related papers (2024-02-28T18:59:31Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs [49.55919802779889]
We propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion.
In this work, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning.
Our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps.
arXiv Detail & Related papers (2022-10-19T17:56:03Z) - SDFEst: Categorical Pose and Shape Estimation of Objects from RGB-D
using Signed Distance Fields [5.71097144710995]
We present a modular pipeline for pose and shape estimation of objects from RGB-D images.
We integrate a generative shape model with a novel network to enable 6D pose and shape estimation from a single or multiple views.
We demonstrate the benefits of our approach over state-of-the-art methods in several experiments on both synthetic and real data.
arXiv Detail & Related papers (2022-07-11T13:53:50Z) - PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal
Distillation for 3D Shape Recognition [55.38462937452363]
We propose a unified multi-view cross-modal distillation architecture, including a pretrained deep image encoder as the teacher and a deep point encoder as the student.
By pair-wise aligning multi-view visual and geometric descriptors, we can obtain more powerful deep point encoders without exhausting and complicated network modification.
arXiv Detail & Related papers (2022-07-07T07:23:20Z) - Depth Completion using Geometry-Aware Embedding [22.333381291860498]
This paper proposes an efficient method to learn geometry-aware embedding.
It encodes the local and global geometric structure information from 3D points, e.g., scene layout, object's sizes and shapes, to guide dense depth estimation.
arXiv Detail & Related papers (2022-03-21T12:06:27Z) - Iterative Corresponding Geometry: Fusing Region and Depth for Highly
Efficient 3D Tracking of Textureless Objects [25.448657318818764]
ICG is a novel probabilistic tracker that fuses region and depth information and only requires the object geometry.
Our method deploys correspondence lines and points to iteratively refine the pose.
Experiments on the YCB-Video, OPT, and Choi datasets demonstrate that, even for textured objects, our approach outperforms the current state of the art.
arXiv Detail & Related papers (2022-03-10T12:30:50Z) - FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation [54.666329929930455]
We present FFB6D, a Bidirectional fusion network designed for 6D pose estimation from a single RGBD image.
We learn to combine appearance and geometry information for representation learning as well as output representation selection.
Our method outperforms the state-of-the-art by large margins on several benchmarks.
arXiv Detail & Related papers (2021-03-03T08:07:29Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.