Segment as Points for Efficient Online Multi-Object Tracking and
Segmentation
- URL: http://arxiv.org/abs/2007.01550v1
- Date: Fri, 3 Jul 2020 08:29:35 GMT
- Title: Segment as Points for Efficient Online Multi-Object Tracking and
Segmentation
- Authors: Zhenbo Xu, Wei Zhang, Xiao Tan, Wei Yang, Huan Huang, Shilei Wen,
Errui Ding, Liusheng Huang
- Abstract summary: We propose a highly effective method for learning instance embeddings based on segments by converting the compact image representation to un-ordered 2D point cloud representation.
Our method generates a new tracking-by-points paradigm where discriminative instance embeddings are learned from randomly selected points rather than images.
The resulting online MOTS framework, named PointTrack, surpasses all the state-of-the-art methods by large margins.
- Score: 66.03023110058464
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current multi-object tracking and segmentation (MOTS) methods follow the
tracking-by-detection paradigm and adopt convolutions for feature extraction.
However, as affected by the inherent receptive field, convolution based feature
extraction inevitably mixes up the foreground features and the background
features, resulting in ambiguities in the subsequent instance association. In
this paper, we propose a highly effective method for learning instance
embeddings based on segments by converting the compact image representation to
un-ordered 2D point cloud representation. Our method generates a new
tracking-by-points paradigm where discriminative instance embeddings are
learned from randomly selected points rather than images. Furthermore, multiple
informative data modalities are converted into point-wise representations to
enrich point-wise features. The resulting online MOTS framework, named
PointTrack, surpasses all the state-of-the-art methods including 3D tracking
methods by large margins (5.4% higher MOTSA and 18 times faster over
MOTSFusion) with the near real-time speed (22 FPS). Evaluations across three
datasets demonstrate both the effectiveness and efficiency of our method.
Moreover, based on the observation that current MOTS datasets lack crowded
scenes, we build a more challenging MOTS dataset named APOLLO MOTS with higher
instance density. Both APOLLO MOTS and our codes are publicly available at
https://github.com/detectRecog/PointTrack.
Related papers
- Boosting 3D Object Detection with Semantic-Aware Multi-Branch Framework [44.44329455757931]
In autonomous driving, LiDAR sensors are vital for acquiring 3D point clouds, providing reliable geometric information.
Traditional sampling methods of preprocessing often ignore semantic features, leading to detail loss and ground point interference.
We propose a multi-branch two-stage 3D object detection framework using a Semantic-aware Multi-branch Sampling (SMS) module and multi-view constraints.
arXiv Detail & Related papers (2024-07-08T09:25:45Z) - PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest [65.48057241587398]
PoIFusion is a framework to fuse information of RGB images and LiDAR point clouds at the points of interest (PoIs)
Our approach maintains the view of each modality and obtains multi-modal features by computation-friendly projection and computation.
We conducted extensive experiments on nuScenes and Argoverse2 datasets to evaluate our approach.
arXiv Detail & Related papers (2024-03-14T09:28:12Z) - Dynamic Clustering Transformer Network for Point Cloud Segmentation [23.149220817575195]
We propose a novel 3D point cloud representation network, called Dynamic Clustering Transformer Network (DCTNet)
It has an encoder-decoder architecture, allowing for both local and global feature learning.
Our method was evaluated on an object-based dataset (ShapeNet), an urban navigation dataset (Toronto-3D), and a multispectral LiDAR dataset.
arXiv Detail & Related papers (2023-05-30T01:11:05Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object
Detection [78.90102636266276]
We propose a novel set abstraction method named Semantics-Augmented Set Abstraction (SASA)
Based on the estimated point-wise foreground scores, we then propose a semantics-guided point sampling algorithm to help retain more important foreground points during down-sampling.
In practice, SASA shows to be effective in identifying valuable points related to foreground objects and improving feature learning for point-based 3D detection.
arXiv Detail & Related papers (2022-01-06T08:54:47Z) - Learning Semantic Segmentation of Large-Scale Point Clouds with Random
Sampling [52.464516118826765]
We introduce RandLA-Net, an efficient and lightweight neural architecture to infer per-point semantics for large-scale point clouds.
The key to our approach is to use random point sampling instead of more complex point selection approaches.
Our RandLA-Net can process 1 million points in a single pass up to 200x faster than existing approaches.
arXiv Detail & Related papers (2021-07-06T05:08:34Z) - M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object
Detection with Transformers [78.48081972698888]
We present M3DeTR, which combines different point cloud representations with different feature scales based on multi-scale feature pyramids.
M3DeTR is the first approach that unifies multiple point cloud representations, feature scales, as well as models mutual relationships between point clouds simultaneously using transformers.
arXiv Detail & Related papers (2021-04-24T06:48:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.