3D Multi-Object Tracking with Differentiable Pose Estimation
- URL: http://arxiv.org/abs/2206.13785v1
- Date: Tue, 28 Jun 2022 06:46:32 GMT
- Title: 3D Multi-Object Tracking with Differentiable Pose Estimation
- Authors: Dominik Schmauser, Zeju Qiu, Norman M\"uller, Matthias Nie{\ss}ner
- Abstract summary: We propose a novel approach for joint 3D multi-object tracking and reconstruction from RGB-D sequences in indoor environments.
We leverage those correspondences to inform a graph neural network to solve for the optimal, temporally-consistent 7-DoF pose trajectories of all objects.
Our method improves the accumulated MOTA score for all test sequences by 24.8% over existing state-of-the-art methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel approach for joint 3D multi-object tracking and
reconstruction from RGB-D sequences in indoor environments. To this end, we
detect and reconstruct objects in each frame while predicting dense
correspondences mappings into a normalized object space. We leverage those
correspondences to inform a graph neural network to solve for the optimal,
temporally-consistent 7-DoF pose trajectories of all objects. The novelty of
our method is two-fold: first, we propose a new graph-based approach for
differentiable pose estimation over time to learn optimal pose trajectories;
second, we present a joint formulation of reconstruction and pose estimation
along the time axis for robust and geometrically consistent multi-object
tracking. In order to validate our approach, we introduce a new synthetic
dataset comprising 2381 unique indoor sequences with a total of 60k rendered
RGB-D images for multi-object tracking with moving objects and camera positions
derived from the synthetic 3D-FRONT dataset. We demonstrate that our method
improves the accumulated MOTA score for all test sequences by 24.8% over
existing state-of-the-art methods. In several ablations on synthetic and
real-world sequences, we show that our graph-based, fully end-to-end-learnable
approach yields a significant boost in tracking performance.
Related papers
- Towards Scalable Multi-View Reconstruction of Geometry and Materials [27.660389147094715]
We propose a novel method for joint recovery of camera pose, object geometry and spatially-varying Bidirectional Reflectance Distribution Function (svBRDF) of 3D scenes.
The input are high-resolution RGBD images captured by a mobile, hand-held capture system with point lights for active illumination.
arXiv Detail & Related papers (2023-06-06T15:07:39Z) - Tracking by 3D Model Estimation of Unknown Objects in Videos [122.56499878291916]
We argue that this representation is limited and instead propose to guide and improve 2D tracking with an explicit object representation.
Our representation tackles a complex long-term dense correspondence problem between all 3D points on the object for all video frames.
The proposed optimization minimizes a novel loss function to estimate the best 3D shape, texture, and 6DoF pose.
arXiv Detail & Related papers (2023-04-13T11:32:36Z) - GOOD: General Optimization-based Fusion for 3D Object Detection via
LiDAR-Camera Object Candidates [10.534984939225014]
3D object detection serves as the core basis of the perception tasks in autonomous driving.
Good is a general optimization-based fusion framework that can achieve satisfying detection without training additional models.
Experiments on both nuScenes and KITTI datasets are carried out and the results show that GOOD outperforms by 9.1% on mAP score compared with PointPillars.
arXiv Detail & Related papers (2023-03-17T07:05:04Z) - Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream.
At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank.
To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - Next-best-view Regression using a 3D Convolutional Neural Network [0.9449650062296823]
We propose a data-driven approach to address the next-best-view problem.
The proposed approach trains a 3D convolutional neural network with previous reconstructions in order to regress the btxtposition of the next-best-view.
We have validated the proposed approach making use of two groups of experiments.
arXiv Detail & Related papers (2021-01-23T01:50:26Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - How to track your dragon: A Multi-Attentional Framework for real-time
RGB-D 6-DOF Object Pose Tracking [35.21561169636035]
We present a novel multi-attentional convolutional architecture to tackle the problem of real-time RGB-D 6D object pose tracking.
We consider the special geometrical properties of both the object's 3D model and the pose space, and we use a more sophisticated approach for data augmentation during training.
arXiv Detail & Related papers (2020-04-21T23:00:06Z) - Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking [34.40019455462043]
We propose a joint spatial-temporal optimization-based stereo 3D object tracking method.
From the network, we detect corresponding 2D bounding boxes on adjacent images and regress an initial 3D bounding box.
Dense object cues that associating to the object centroid are then predicted using a region-based network.
arXiv Detail & Related papers (2020-04-20T13:59:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.