Related papers: GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with Multi-Feature Learning

GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with Multi-Feature Learning

URL: http://arxiv.org/abs/2006.07327v1
Date: Fri, 12 Jun 2020 17:08:14 GMT
Title: GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with Multi-Feature Learning
Authors: Xinshuo Weng, Yongxin Wang, Yunze Man, Kris Kitani
Abstract summary: 3D Multi-object tracking (MOT) is crucial to autonomous systems. We propose two techniques to improve the discriminative feature learning for MOT. Our proposed method achieves state-of-the-art performance on KITTI and nuScenes 3D MOT benchmarks.
Score: 30.72094639797806
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D Multi-object tracking (MOT) is crucial to autonomous systems. Recent work uses a standard tracking-by-detection pipeline, where feature extraction is first performed independently for each object in order to compute an affinity matrix. Then the affinity matrix is passed to the Hungarian algorithm for data association. A key process of this standard pipeline is to learn discriminative features for different objects in order to reduce confusion during data association. In this work, we propose two techniques to improve the discriminative feature learning for MOT: (1) instead of obtaining features for each object independently, we propose a novel feature interaction mechanism by introducing the Graph Neural Network. As a result, the feature of one object is informed of the features of other objects so that the object feature can lean towards the object with similar feature (i.e., object probably with a same ID) and deviate from objects with dissimilar features (i.e., object probably with different IDs), leading to a more discriminative feature for each object; (2) instead of obtaining the feature from either 2D or 3D space in prior work, we propose a novel joint feature extractor to learn appearance and motion features from 2D and 3D space simultaneously. As features from different modalities often have complementary information, the joint feature can be more discriminate than feature from each individual modality. To ensure that the joint feature extractor does not heavily rely on one modality, we also propose an ensemble training paradigm. Through extensive evaluation, our proposed method achieves state-of-the-art performance on KITTI and nuScenes 3D MOT benchmarks. Our code will be made available at https://github.com/xinshuoweng/GNN3DMOT

Related papers

Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space [58.623106094568776]
3D morphable models (3DMMs) are a powerful tool to represent the possible shapes and appearances of an object category. We introduce a new method, Common3D, that learns 3DMMs of common objects in a fully self-supervised manner from a collection of object-centric videos. Common3D is the first completely self-supervised method that can solve various vision tasks in a zero-shot manner.
arXiv Detail & Related papers (2025-04-30T15:42:23Z)
IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction. We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images. We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z)
ROAM: Robust and Object-Aware Motion Generation Using Neural Pose Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object. We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object. We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z)
3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds. Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z)
The Devil is in the Task: Exploiting Reciprocal Appearance-Localization Features for Monocular 3D Object Detection [62.1185839286255]
Low-cost monocular 3D object detection plays a fundamental role in autonomous driving. We introduce a Dynamic Feature Reflecting Network, named DFR-Net. We rank 1st among all the monocular 3D object detectors in the KITTI test set.
arXiv Detail & Related papers (2021-12-28T07:31:18Z)
Learning Feature Aggregation for Deep 3D Morphable Models [57.1266963015401]
We propose an attention based module to learn mapping matrices for better feature aggregation across hierarchical levels. Our experiments show that through the end-to-end training of the mapping matrices, we achieve state-of-the-art results on a variety of 3D shape datasets.
arXiv Detail & Related papers (2021-05-05T16:41:00Z)
Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem. We employ a Neural Message Passing network for data association that is fully trainable. We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z)
HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection [39.64891219500416]
3D object detection methods exploit either voxel-based or point-based features to represent 3D objects in a scene. We introduce in this paper a novel single-stage 3D detection method having the merit of both voxel-based and point-based features.
arXiv Detail & Related papers (2021-04-02T06:34:49Z)
SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection [9.924083358178239]
We propose two variants of self-attention for contextual modeling in 3D object detection. We first incorporate the pairwise self-attention mechanism into the current state-of-the-art BEV, voxel and point-based detectors. Next, we propose a self-attention variant that samples a subset of the most representative features by learning deformations over randomly sampled locations.
arXiv Detail & Related papers (2021-01-07T18:30:32Z)
Relation3DMOT: Exploiting Deep Affinity for 3D Multi-Object Tracking from View Aggregation [8.854112907350624]
3D multi-object tracking plays a vital role in autonomous navigation. Many approaches detect objects in 2D RGB sequences for tracking, which is lack of reliability when localizing objects in 3D space. We propose a novel convolutional operation, named RelationConv, to better exploit the correlation between each pair of objects in the adjacent frames.
arXiv Detail & Related papers (2020-11-25T16:14:40Z)
End-to-End 3D Multi-Object Tracking and Trajectory Forecasting [34.68114553744956]
We propose a unified solution for 3D MOT and trajectory forecasting. We employ a feature interaction technique by introducing Graph Neural Networks. We also use a diversity sampling function to improve the quality and diversity of our forecasted trajectories.
arXiv Detail & Related papers (2020-08-25T16:54:46Z)
Graph Neural Networks for 3D Multi-Object Tracking [28.121708602059048]
3D Multi-object tracking (MOT) is crucial to autonomous systems. Recent work often uses a tracking-by-detection pipeline. We propose a novel feature interaction mechanism by introducing Graph Neural Networks.
arXiv Detail & Related papers (2020-08-20T17:55:41Z)
D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features [51.04841465193678]
We leverage a 3D fully convolutional network for 3D point clouds. We propose a novel and practical learning mechanism that densely predicts both a detection score and a description feature for each 3D point. Our method achieves state-of-the-art results in both indoor and outdoor scenarios.
arXiv Detail & Related papers (2020-03-06T12:51:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.