Correlation Pyramid Network for 3D Single Object Tracking
- URL: http://arxiv.org/abs/2305.09195v1
- Date: Tue, 16 May 2023 06:07:20 GMT
- Title: Correlation Pyramid Network for 3D Single Object Tracking
- Authors: Mengmeng Wang, Teli Ma, Xingxing Zuo, Jiajun Lv, Yong Liu
- Abstract summary: We propose a novel Correlation Pyramid Network (CorpNet) with a unified encoder and a motion-factorized decoder.
CorpNet achieves state-of-the-art results while running in real-time.
- Score: 16.694809791177263
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D LiDAR-based single object tracking (SOT) has gained increasing attention
as it plays a crucial role in 3D applications such as autonomous driving. The
central problem is how to learn a target-aware representation from the sparse
and incomplete point clouds. In this paper, we propose a novel Correlation
Pyramid Network (CorpNet) with a unified encoder and a motion-factorized
decoder. Specifically, the encoder introduces multi-level self attentions and
cross attentions in its main branch to enrich the template and search region
features and realize their fusion and interaction, respectively. Additionally,
considering the sparsity characteristics of the point clouds, we design a
lateral correlation pyramid structure for the encoder to keep as many points as
possible by integrating hierarchical correlated features. The output features
of the search region from the encoder can be directly fed into the decoder for
predicting target locations without any extra matcher. Moreover, in the decoder
of CorpNet, we design a motion-factorized head to explicitly learn the
different movement patterns of the up axis and the x-y plane together.
Extensive experiments on two commonly-used datasets show our CorpNet achieves
state-of-the-art results while running in real-time.
Related papers
- FASTC: A Fast Attentional Framework for Semantic Traversability Classification Using Point Cloud [7.711666704468952]
We address the problem of traversability assessment using point clouds.
We propose a pillar feature extraction module that utilizes PointNet to capture features from point clouds organized in vertical volume.
We then propose a newtemporal attention module to fuse multi-frame information, which can properly handle the varying density problem of LIDAR point clouds.
arXiv Detail & Related papers (2024-06-24T12:01:55Z) - CT3D++: Improving 3D Object Detection with Keypoint-induced Channel-wise Transformer [42.68740105997167]
We introduce two frameworks for 3D object detection with minimal hand-crafted design.
Firstly, we propose CT3D, which sequentially performs raw-point-based embedding, a standard Transformer encoder, and a channel-wise decoder for point features within each proposal.
Secondly, we present an enhanced network called CT3D++, which incorporates geometric and semantic fusion-based embedding to extract more valuable and comprehensive proposal-aware information.
arXiv Detail & Related papers (2024-06-12T12:40:28Z) - Multi-Correlation Siamese Transformer Network with Dense Connection for
3D Single Object Tracking [14.47355191520578]
Point cloud-based 3D object tracking is an important task in autonomous driving.
It remains challenging to learn the correlation between the template and search branches effectively with the sparse LIDAR point cloud data.
We present a multi-correlation Siamese Transformer network that has multiple stages and carries out feature correlation at the end of each stage.
arXiv Detail & Related papers (2023-12-18T09:33:49Z) - DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields [68.94868475824575]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations.
We leverage the strong semantic prior within a 3D generative model to train a semantic decoder.
Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z) - HEDNet: A Hierarchical Encoder-Decoder Network for 3D Object Detection
in Point Clouds [19.1921315424192]
3D object detection in point clouds is important for autonomous driving systems.
A primary challenge in 3D object detection stems from the sparse distribution of points within the 3D scene.
We propose HEDNet, a hierarchical encoder-decoder network for 3D object detection.
arXiv Detail & Related papers (2023-10-31T07:32:08Z) - CXTrack: Improving 3D Point Cloud Tracking with Contextual Information [59.55870742072618]
3D single object tracking plays an essential role in many applications, such as autonomous driving.
We propose CXTrack, a novel transformer-based network for 3D object tracking.
We show that CXTrack achieves state-of-the-art tracking performance while running at 29 FPS.
arXiv Detail & Related papers (2022-11-12T11:29:01Z) - 3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D
Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds.
Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z) - 3D Siamese Transformer Network for Single Object Tracking on Point
Clouds [22.48888264770609]
Siamese network based trackers formulate 3D single object tracking as cross-correlation learning between point features of a template and a search area.
We explicitly use Transformer to form a 3D Siamese Transformer network for learning robust cross correlation between the template and the search area.
Our method achieves state-of-the-art performance on the 3D single object tracking task.
arXiv Detail & Related papers (2022-07-25T09:08:30Z) - Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud
Pre-training [56.81809311892475]
Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for language and 2D image transformers.
We propose Point-M2AE, a strong Multi-scale MAE pre-training framework for hierarchical self-supervised learning of 3D point clouds.
arXiv Detail & Related papers (2022-05-28T11:22:53Z) - Self-Supervised Point Cloud Representation Learning with Occlusion
Auto-Encoder [63.77257588569852]
We present 3D Occlusion Auto-Encoder (3D-OAE) for learning representations for point clouds.
Our key idea is to randomly occlude some local patches of the input point cloud and establish the supervision via recovering the occluded patches.
In contrast with previous methods, our 3D-OAE can remove a large proportion of patches and predict them only with a small number of visible patches.
arXiv Detail & Related papers (2022-03-26T14:06:29Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.