Multi-Correlation Siamese Transformer Network with Dense Connection for
3D Single Object Tracking
- URL: http://arxiv.org/abs/2312.11051v1
- Date: Mon, 18 Dec 2023 09:33:49 GMT
- Title: Multi-Correlation Siamese Transformer Network with Dense Connection for
3D Single Object Tracking
- Authors: Shihao Feng, Pengpeng Liang, Jin Gao, Erkang Cheng
- Abstract summary: Point cloud-based 3D object tracking is an important task in autonomous driving.
It remains challenging to learn the correlation between the template and search branches effectively with the sparse LIDAR point cloud data.
We present a multi-correlation Siamese Transformer network that has multiple stages and carries out feature correlation at the end of each stage.
- Score: 14.47355191520578
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Point cloud-based 3D object tracking is an important task in autonomous
driving. Though great advances regarding Siamese-based 3D tracking have been
made recently, it remains challenging to learn the correlation between the
template and search branches effectively with the sparse LIDAR point cloud
data. Instead of performing correlation of the two branches at just one point
in the network, in this paper, we present a multi-correlation Siamese
Transformer network that has multiple stages and carries out feature
correlation at the end of each stage based on sparse pillars. More
specifically, in each stage, self-attention is first applied to each branch
separately to capture the non-local context information. Then, cross-attention
is used to inject the template information into the search area. This strategy
allows the feature learning of the search area to be aware of the template
while keeping the individual characteristics of the template intact. To enable
the network to easily preserve the information learned at different stages and
ease the optimization, for the search area, we densely connect the initial
input sparse pillars and the output of each stage to all subsequent stages and
the target localization network, which converts pillars to bird's eye view
(BEV) feature maps and predicts the state of the target with a small densely
connected convolution network. Deep supervision is added to each stage to
further boost the performance as well. The proposed algorithm is evaluated on
the popular KITTI, nuScenes, and Waymo datasets, and the experimental results
show that our method achieves promising performance compared with the
state-of-the-art. Ablation study that shows the effectiveness of each component
is provided as well. Code is available at
https://github.com/liangp/MCSTN-3DSOT.
Related papers
- Clustering based Point Cloud Representation Learning for 3D Analysis [80.88995099442374]
We propose a clustering based supervised learning scheme for point cloud analysis.
Unlike current de-facto, scene-wise training paradigm, our algorithm conducts within-class clustering on the point embedding space.
Our algorithm shows notable improvements on famous point cloud segmentation datasets.
arXiv Detail & Related papers (2023-07-27T03:42:12Z) - Correlation Pyramid Network for 3D Single Object Tracking [16.694809791177263]
We propose a novel Correlation Pyramid Network (CorpNet) with a unified encoder and a motion-factorized decoder.
CorpNet achieves state-of-the-art results while running in real-time.
arXiv Detail & Related papers (2023-05-16T06:07:20Z) - Unleash the Potential of Image Branch for Cross-modal 3D Object
Detection [67.94357336206136]
We present a new cross-modal 3D object detector, namely UPIDet, which aims to unleash the potential of the image branch from two aspects.
First, UPIDet introduces a new 2D auxiliary task called normalized local coordinate map estimation.
Second, we discover that the representational capability of the point cloud backbone can be enhanced through the gradients backpropagated from the training objectives of the image branch.
arXiv Detail & Related papers (2023-01-22T08:26:58Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - OST: Efficient One-stream Network for 3D Single Object Tracking in Point Clouds [6.661881950861012]
We propose a novel one-stream network with the strength of the instance-level encoding, which avoids the correlation operations occurring in previous Siamese network.
The proposed method has achieved considerable performance not only for class-specific tracking but also for class-agnostic tracking with less computation and higher efficiency.
arXiv Detail & Related papers (2022-10-16T12:31:59Z) - CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point
Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation.
We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration.
The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z) - 3D Siamese Transformer Network for Single Object Tracking on Point
Clouds [22.48888264770609]
Siamese network based trackers formulate 3D single object tracking as cross-correlation learning between point features of a template and a search area.
We explicitly use Transformer to form a 3D Siamese Transformer network for learning robust cross correlation between the template and the search area.
Our method achieves state-of-the-art performance on the 3D single object tracking task.
arXiv Detail & Related papers (2022-07-25T09:08:30Z) - DFC: Deep Feature Consistency for Robust Point Cloud Registration [0.4724825031148411]
We present a novel learning-based alignment network for complex alignment scenes.
We validate our approach on the 3DMatch dataset and the KITTI odometry dataset.
arXiv Detail & Related papers (2021-11-15T08:27:21Z) - M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object
Detection with Transformers [78.48081972698888]
We present M3DeTR, which combines different point cloud representations with different feature scales based on multi-scale feature pyramids.
M3DeTR is the first approach that unifies multiple point cloud representations, feature scales, as well as models mutual relationships between point clouds simultaneously using transformers.
arXiv Detail & Related papers (2021-04-24T06:48:23Z) - Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical
Understanding of Outdoor Scene [76.4183572058063]
We present a richly-annotated 3D point cloud dataset for multiple outdoor scene understanding tasks.
The dataset has been point-wisely annotated with both hierarchical and instance-based labels.
We formulate a hierarchical learning problem for 3D point cloud segmentation and propose a measurement evaluating consistency across various hierarchies.
arXiv Detail & Related papers (2020-08-11T19:10:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.