Related papers: PTT: Point-Track-Transformer Module for 3D Single Object Tracking in Point Clouds

PTT: Point-Track-Transformer Module for 3D Single Object Tracking in Point Clouds

URL: http://arxiv.org/abs/2108.06455v1
Date: Sat, 14 Aug 2021 03:24:10 GMT
Title: PTT: Point-Track-Transformer Module for 3D Single Object Tracking in Point Clouds
Authors: Jiayao Shan, Sifan Zhou, Zheng Fang, Yubo Cui
Abstract summary: Point-Track-Transformer (PTT) for point cloud-based 3D single object tracking. PTT module contains three blocks for feature embedding, position encoding, and self-attention feature. Our PTT-Net surpasses the state-of-the-art by a noticeable margin (10%)
Score: 7.482036504835097
License: http://creativecommons.org/licenses/by/4.0/
Abstract: 3D single object tracking is a key issue for robotics. In this paper, we propose a transformer module called Point-Track-Transformer (PTT) for point cloud-based 3D single object tracking. PTT module contains three blocks for feature embedding, position encoding, and self-attention feature computation. Feature embedding aims to place features closer in the embedding space if they have similar semantic information. Position encoding is used to encode coordinates of point clouds into high dimension distinguishable features. Self-attention generates refined attention features by computing attention weights. Besides, we embed the PTT module into the open-source state-of-the-art method P2B to construct PTT-Net. Experiments on the KITTI dataset reveal that our PTT-Net surpasses the state-of-the-art by a noticeable margin (~10\%). Additionally, PTT-Net could achieve real-time performance (~40FPS) on NVIDIA 1080Ti GPU. Our code is open-sourced for the robotics community at https://github.com/shanjiayao/PTT.

Related papers

VGGT: Visual Geometry Grounded Transformer [61.37669770946458]
VGGT is a feed-forward neural network that directly infers all key 3D attributes of a scene. Network achieves state-of-the-art results in multiple 3D tasks.
arXiv Detail & Related papers (2025-03-14T17:59:47Z)
CT3D++: Improving 3D Object Detection with Keypoint-induced Channel-wise Transformer [42.68740105997167]
We introduce two frameworks for 3D object detection with minimal hand-crafted design. Firstly, we propose CT3D, which sequentially performs raw-point-based embedding, a standard Transformer encoder, and a channel-wise decoder for point features within each proposal. Secondly, we present an enhanced network called CT3D++, which incorporates geometric and semantic fusion-based embedding to extract more valuable and comprehensive proposal-aware information.
arXiv Detail & Related papers (2024-06-12T12:40:28Z)
PillarTrack: Redesigning Pillar-based Transformer Network for Single Object Tracking on Point Clouds [5.524413892353708]
LiDAR-based 3D single object tracking (3D SOT) is a critical issue in robotics and autonomous driving. We propose PillarTrack, a pillar-based 3D single object tracking framework. PillarTrack achieves state-of-the-art performance on the KITTI and nuScenes dataset and enables real-time tracking speed.
arXiv Detail & Related papers (2024-04-11T06:06:56Z)
EasyTrack: Efficient and Compact One-stream 3D Point Clouds Tracker [35.74677036815288]
We propose a neat and compact one-stream transformer 3D SOT paradigm, termed as textbfEasyTrack. A 3D point clouds tracking feature pre-training module is developed to exploit the masked autoencoding for learning 3D point clouds tracking representations. A target location network in the dense bird's eye view (BEV) feature space is constructed for target classification and regression.
arXiv Detail & Related papers (2024-04-09T02:47:52Z)
Real-time 3D Single Object Tracking with Transformer [5.000768859809606]
Point-Track-Transformer (PTT) module for point cloud-based 3D single object tracking task. PTT module generates fine-tuned attention features by computing attention weights. In PTT-Net, we embed PTT into the voting stage and proposal generation stage.
arXiv Detail & Related papers (2022-09-02T07:36:20Z)
Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with Transformer [62.68401838976208]
3D object tracking aims to predict the location and orientation of an object in consecutive frames given an object template. Motivated by the success of transformers, we propose Point Tracking TRansformer (PTTR), which efficiently predicts high-quality 3D tracking results.
arXiv Detail & Related papers (2022-08-10T08:36:46Z)
Graph Neural Network and Spatiotemporal Transformer Attention for 3D Video Object Detection from Point Clouds [94.21415132135951]
We propose to detect 3D objects by exploiting temporal information in multiple frames. We implement our algorithm based on prevalent anchor-based and anchor-free detectors.
arXiv Detail & Related papers (2022-07-26T05:16:28Z)
PiFeNet: Pillar-Feature Network for Real-Time 3D Pedestrian Detection from Point Cloud [64.12626752721766]
We present PiFeNet, an efficient real-time 3D detector for pedestrian detection from point clouds. We address two challenges that 3D object detection frameworks encounter when detecting pedestrians: low of pillar features and small occupation areas of pedestrians in point clouds. Our approach is ranked 1st in KITTI pedestrian BEV and 3D leaderboards while running at 26 frames per second (FPS), and achieves state-of-the-art performance on Nuscenes detection benchmark.
arXiv Detail & Related papers (2021-12-31T13:41:37Z)
PTTR: Relational 3D Point Cloud Object Tracking with Transformer [37.06516957454285]
In a point cloud sequence, 3D object tracking aims to predict the location and orientation of an object in the current search point cloud given a template point cloud. We propose Point Tracking TRansformer (PTTR), which efficiently predicts high-quality 3D tracking results in a coarse-to-fine manner with the help of transformer operations.
arXiv Detail & Related papers (2021-12-06T08:28:05Z)
Trident Pyramid Networks: The importance of processing at the feature pyramid level for better object detection [50.008529403150206]
We present a new core architecture called Trident Pyramid Network (TPN) TPN allows for a deeper design and for a better balance between communication-based processing and self-processing. We show consistent improvements when using our TPN core on the object detection benchmark, outperforming the popular BiFPN baseline by 1.5 AP.
arXiv Detail & Related papers (2021-10-08T09:59:59Z)
PC-DAN: Point Cloud based Deep Affinity Network for 3D Multi-Object Tracking (Accepted as an extended abstract in JRDB-ACT Workshop at CVPR21) [68.12101204123422]
A point cloud is a dense compilation of spatial data in 3D coordinates. We propose a PointNet-based approach for 3D Multi-Object Tracking (MOT)
arXiv Detail & Related papers (2021-06-03T05:36:39Z)
PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection [76.30585706811993]
We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN) Our proposed method deeply integrates both 3D voxel Convolutional Neural Network (CNN) and PointNet-based set abstraction. It takes advantages of efficient learning and high-quality proposals of the 3D voxel CNN and the flexible receptive fields of the PointNet-based networks.
arXiv Detail & Related papers (2019-12-31T06:34:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.