PolarMOT: How Far Can Geometric Relations Take Us in 3D Multi-Object
Tracking?
- URL: http://arxiv.org/abs/2208.01957v1
- Date: Wed, 3 Aug 2022 10:06:56 GMT
- Title: PolarMOT: How Far Can Geometric Relations Take Us in 3D Multi-Object
Tracking?
- Authors: Aleksandr Kim (1), Guillem Bras\'o (1), Aljo\v{s}a O\v{s}ep (1), Laura
Leal-Taix\'e (1) ((1) Technical University of Munich)
- Abstract summary: We encode 3D detections as nodes in a graph, where spatial and temporal pairwise relations among objects are encoded via localized polar coordinates on graph edges.
This allows our graph neural network to learn to effectively encode temporal and spatial interactions.
We establish a new state-of-the-art on nuScenes dataset and, more importantly, show that our method, PolarMOT, generalizes remarkably well across different locations.
- Score: 62.997667081978825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most (3D) multi-object tracking methods rely on appearance-based cues for
data association. By contrast, we investigate how far we can get by only
encoding geometric relationships between objects in 3D space as cues for
data-driven data association. We encode 3D detections as nodes in a graph,
where spatial and temporal pairwise relations among objects are encoded via
localized polar coordinates on graph edges. This representation makes our
geometric relations invariant to global transformations and smooth trajectory
changes, especially under non-holonomic motion. This allows our graph neural
network to learn to effectively encode temporal and spatial interactions and
fully leverage contextual and motion cues to obtain final scene interpretation
by posing data association as edge classification. We establish a new
state-of-the-art on nuScenes dataset and, more importantly, show that our
method, PolarMOT, generalizes remarkably well across different locations
(Boston, Singapore, Karlsruhe) and datasets (nuScenes and KITTI).
Related papers
- Oriented-grid Encoder for 3D Implicit Representations [10.02138130221506]
This paper is the first to exploit 3D characteristics in 3D geometric encoders explicitly.
Our method gets state-of-the-art results when compared to the prior techniques.
arXiv Detail & Related papers (2024-02-09T19:28:13Z) - STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning [4.676784872259775]
We propose a large-scale video dataset for understanding spatial relationships derived from prepositions of the English language.
The dataset contains 150K visual depictions (videos and images), consisting of 30 distinct spatial prepositional senses.
In addition to spatial relations, we also propose 50K visual depictions across 10 temporal relations, consisting of videos depicting event/time-point interactions.
arXiv Detail & Related papers (2023-09-13T02:35:59Z) - 3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding [58.924180772480504]
3D visual grounding aims to localize the target object in a 3D point cloud by a free-form language description.
We propose a relation-aware one-stage framework, named 3D Relative Position-aware Network (3-Net)
arXiv Detail & Related papers (2023-07-25T09:33:25Z) - SGAligner : 3D Scene Alignment with Scene Graphs [84.01002998166145]
Building 3D scene graphs has emerged as a topic in scene representation for several embodied AI applications.
We focus on the fundamental problem of aligning pairs of 3D scene graphs whose overlap can range from zero to partial.
We propose SGAligner, the first method for aligning pairs of 3D scene graphs that is robust to in-the-wild scenarios.
arXiv Detail & Related papers (2023-04-28T14:39:22Z) - BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown
Objects [89.2314092102403]
We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence.
Our method works for arbitrary rigid objects, even when visual texture is largely absent.
arXiv Detail & Related papers (2023-03-24T17:13:49Z) - Zero-shot point cloud segmentation by transferring geometric primitives [68.18710039217336]
We investigate zero-shot point cloud semantic segmentation, where the network is trained on seen objects and able to segment unseen objects.
We propose a novel framework to learn the geometric primitives shared in seen and unseen categories' objects and employ a fine-grained alignment between language and the learned geometric primitives.
arXiv Detail & Related papers (2022-10-18T15:06:54Z) - Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations
in 3D [71.11034329713058]
Existing datasets lack large-scale, high-quality 3D ground truth information.
Rel3D is the first large-scale, human-annotated dataset for grounding spatial relations in 3D.
We propose minimally contrastive data collection -- a novel crowdsourcing method for reducing dataset bias.
arXiv Detail & Related papers (2020-12-03T01:51:56Z) - Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking [34.40019455462043]
We propose a joint spatial-temporal optimization-based stereo 3D object tracking method.
From the network, we detect corresponding 2D bounding boxes on adjacent images and regress an initial 3D bounding box.
Dense object cues that associating to the object centroid are then predicted using a region-based network.
arXiv Detail & Related papers (2020-04-20T13:59:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.