PTT: Point-Track-Transformer Module for 3D Single Object Tracking in
Point Clouds
- URL: http://arxiv.org/abs/2108.06455v1
- Date: Sat, 14 Aug 2021 03:24:10 GMT
- Title: PTT: Point-Track-Transformer Module for 3D Single Object Tracking in
Point Clouds
- Authors: Jiayao Shan, Sifan Zhou, Zheng Fang, Yubo Cui
- Abstract summary: Point-Track-Transformer (PTT) for point cloud-based 3D single object tracking.
PTT module contains three blocks for feature embedding, position encoding, and self-attention feature.
Our PTT-Net surpasses the state-of-the-art by a noticeable margin (10%)
- Score: 7.482036504835097
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D single object tracking is a key issue for robotics. In this paper, we
propose a transformer module called Point-Track-Transformer (PTT) for point
cloud-based 3D single object tracking. PTT module contains three blocks for
feature embedding, position encoding, and self-attention feature computation.
Feature embedding aims to place features closer in the embedding space if they
have similar semantic information. Position encoding is used to encode
coordinates of point clouds into high dimension distinguishable features.
Self-attention generates refined attention features by computing attention
weights. Besides, we embed the PTT module into the open-source state-of-the-art
method P2B to construct PTT-Net. Experiments on the KITTI dataset reveal that
our PTT-Net surpasses the state-of-the-art by a noticeable margin (~10\%).
Additionally, PTT-Net could achieve real-time performance (~40FPS) on NVIDIA
1080Ti GPU. Our code is open-sourced for the robotics community at
https://github.com/shanjiayao/PTT.
Related papers
- PillarTrack: Redesigning Pillar-based Transformer Network for Single Object Tracking on Point Clouds [5.524413892353708]
LiDAR-based 3D single object tracking (3D SOT) is a critical issue in robotics and autonomous driving.
We propose PillarTrack, a pillar-based 3D single object tracking framework.
PillarTrack achieves state-of-the-art performance on the KITTI and nuScenes dataset and enables real-time tracking speed.
arXiv Detail & Related papers (2024-04-11T06:06:56Z) - EasyTrack: Efficient and Compact One-stream 3D Point Clouds Tracker [35.74677036815288]
We propose a neat and compact one-stream transformer 3D SOT paradigm, termed as textbfEasyTrack.
A 3D point clouds tracking feature pre-training module is developed to exploit the masked autoencoding for learning 3D point clouds tracking representations.
A target location network in the dense bird's eye view (BEV) feature space is constructed for target classification and regression.
arXiv Detail & Related papers (2024-04-09T02:47:52Z) - Real-time 3D Single Object Tracking with Transformer [5.000768859809606]
Point-Track-Transformer (PTT) module for point cloud-based 3D single object tracking task.
PTT module generates fine-tuned attention features by computing attention weights.
In PTT-Net, we embed PTT into the voting stage and proposal generation stage.
arXiv Detail & Related papers (2022-09-02T07:36:20Z) - Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with
Transformer [62.68401838976208]
3D object tracking aims to predict the location and orientation of an object in consecutive frames given an object template.
Motivated by the success of transformers, we propose Point Tracking TRansformer (PTTR), which efficiently predicts high-quality 3D tracking results.
arXiv Detail & Related papers (2022-08-10T08:36:46Z) - Graph Neural Network and Spatiotemporal Transformer Attention for 3D
Video Object Detection from Point Clouds [94.21415132135951]
We propose to detect 3D objects by exploiting temporal information in multiple frames.
We implement our algorithm based on prevalent anchor-based and anchor-free detectors.
arXiv Detail & Related papers (2022-07-26T05:16:28Z) - PiFeNet: Pillar-Feature Network for Real-Time 3D Pedestrian Detection
from Point Cloud [64.12626752721766]
We present PiFeNet, an efficient real-time 3D detector for pedestrian detection from point clouds.
We address two challenges that 3D object detection frameworks encounter when detecting pedestrians: low of pillar features and small occupation areas of pedestrians in point clouds.
Our approach is ranked 1st in KITTI pedestrian BEV and 3D leaderboards while running at 26 frames per second (FPS), and achieves state-of-the-art performance on Nuscenes detection benchmark.
arXiv Detail & Related papers (2021-12-31T13:41:37Z) - PTTR: Relational 3D Point Cloud Object Tracking with Transformer [37.06516957454285]
In a point cloud sequence, 3D object tracking aims to predict the location and orientation of an object in the current search point cloud given a template point cloud.
We propose Point Tracking TRansformer (PTTR), which efficiently predicts high-quality 3D tracking results in a coarse-to-fine manner with the help of transformer operations.
arXiv Detail & Related papers (2021-12-06T08:28:05Z) - Trident Pyramid Networks: The importance of processing at the feature
pyramid level for better object detection [50.008529403150206]
We present a new core architecture called Trident Pyramid Network (TPN)
TPN allows for a deeper design and for a better balance between communication-based processing and self-processing.
We show consistent improvements when using our TPN core on the object detection benchmark, outperforming the popular BiFPN baseline by 1.5 AP.
arXiv Detail & Related papers (2021-10-08T09:59:59Z) - PC-DAN: Point Cloud based Deep Affinity Network for 3D Multi-Object
Tracking (Accepted as an extended abstract in JRDB-ACT Workshop at CVPR21) [68.12101204123422]
A point cloud is a dense compilation of spatial data in 3D coordinates.
We propose a PointNet-based approach for 3D Multi-Object Tracking (MOT)
arXiv Detail & Related papers (2021-06-03T05:36:39Z) - PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection [76.30585706811993]
We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN)
Our proposed method deeply integrates both 3D voxel Convolutional Neural Network (CNN) and PointNet-based set abstraction.
It takes advantages of efficient learning and high-quality proposals of the 3D voxel CNN and the flexible receptive fields of the PointNet-based networks.
arXiv Detail & Related papers (2019-12-31T06:34:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.