How to track your dragon: A Multi-Attentional Framework for real-time
RGB-D 6-DOF Object Pose Tracking
- URL: http://arxiv.org/abs/2004.10335v3
- Date: Tue, 15 Sep 2020 11:33:55 GMT
- Title: How to track your dragon: A Multi-Attentional Framework for real-time
RGB-D 6-DOF Object Pose Tracking
- Authors: Isidoros Marougkas, Petros Koutras, Nikos Kardaris, Georgios Retsinas,
Georgia Chalvatzaki, and Petros Maragos
- Abstract summary: We present a novel multi-attentional convolutional architecture to tackle the problem of real-time RGB-D 6D object pose tracking.
We consider the special geometrical properties of both the object's 3D model and the pose space, and we use a more sophisticated approach for data augmentation during training.
- Score: 35.21561169636035
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a novel multi-attentional convolutional architecture to tackle the
problem of real-time RGB-D 6D object pose tracking of single, known objects.
Such a problem poses multiple challenges originating both from the objects'
nature and their interaction with their environment, which previous approaches
have failed to fully address. The proposed framework encapsulates methods for
background clutter and occlusion handling by integrating multiple parallel soft
spatial attention modules into a multitask Convolutional Neural Network (CNN)
architecture. Moreover, we consider the special geometrical properties of both
the object's 3D model and the pose space, and we use a more sophisticated
approach for data augmentation during training. The provided experimental
results confirm the effectiveness of the proposed multi-attentional
architecture, as it improves the State-of-the-Art (SoA) tracking performance by
an average score of 34.03% for translation and 40.01% for rotation, when tested
on the most complete dataset designed, up to date,for the problem of RGB-D
object tracking.
Related papers
- SEMPose: A Single End-to-end Network for Multi-object Pose Estimation [13.131534219937533]
SEMPose is an end-to-end multi-object pose estimation network.
It can perform inference at 32 FPS without requiring inputs other than the RGB image.
It can accurately estimate the poses of multiple objects in real time, with inference time unaffected by the number of target objects.
arXiv Detail & Related papers (2024-11-21T10:37:54Z) - Gr-IoU: Ground-Intersection over Union for Robust Multi-Object Tracking with 3D Geometric Constraints [3.2855317710497625]
Gr-IoU transforms traditional bounding boxes from the image space to the ground plane using the vanishing point geometry.
The IoU calculated with these transformed bounding boxes is more sensitive to the front-to-back relationships of objects.
We evaluated our Gr-IoU method on the MOT17 and MOT20 datasets, which contain diverse tracking scenarios.
arXiv Detail & Related papers (2024-09-05T05:09:03Z) - PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest [65.48057241587398]
PoIFusion is a framework to fuse information of RGB images and LiDAR point clouds at the points of interest (PoIs)
Our approach maintains the view of each modality and obtains multi-modal features by computation-friendly projection and computation.
We conducted extensive experiments on nuScenes and Argoverse2 datasets to evaluate our approach.
arXiv Detail & Related papers (2024-03-14T09:28:12Z) - BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown
Objects [89.2314092102403]
We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence.
Our method works for arbitrary rigid objects, even when visual texture is largely absent.
arXiv Detail & Related papers (2023-03-24T17:13:49Z) - DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving.
We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z) - 3D Multi-Object Tracking with Differentiable Pose Estimation [0.0]
We propose a novel approach for joint 3D multi-object tracking and reconstruction from RGB-D sequences in indoor environments.
We leverage those correspondences to inform a graph neural network to solve for the optimal, temporally-consistent 7-DoF pose trajectories of all objects.
Our method improves the accumulated MOTA score for all test sequences by 24.8% over existing state-of-the-art methods.
arXiv Detail & Related papers (2022-06-28T06:46:32Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - Seeing Behind Objects for 3D Multi-Object Tracking in RGB-D Sequences [46.65702220573459]
We infer the complete geometry of objects as well as track them, for rigidly moving objects over time.
From a sequence of RGB-D frames, we detect objects in each frame and learn to predict their complete object geometry.
Experiments on both synthetic and real-world RGB-D data demonstrate that we achieve state-of-the-art performance on dynamic object tracking.
arXiv Detail & Related papers (2020-12-15T10:33:21Z) - Relation3DMOT: Exploiting Deep Affinity for 3D Multi-Object Tracking
from View Aggregation [8.854112907350624]
3D multi-object tracking plays a vital role in autonomous navigation.
Many approaches detect objects in 2D RGB sequences for tracking, which is lack of reliability when localizing objects in 3D space.
We propose a novel convolutional operation, named RelationConv, to better exploit the correlation between each pair of objects in the adjacent frames.
arXiv Detail & Related papers (2020-11-25T16:14:40Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.