Real-time 3D Single Object Tracking with Transformer
- URL: http://arxiv.org/abs/2209.00860v1
- Date: Fri, 2 Sep 2022 07:36:20 GMT
- Title: Real-time 3D Single Object Tracking with Transformer
- Authors: Jiayao Shan, Sifan Zhou, Yubo Cui, Zheng Fang
- Abstract summary: Point-Track-Transformer (PTT) module for point cloud-based 3D single object tracking task.
PTT module generates fine-tuned attention features by computing attention weights.
In PTT-Net, we embed PTT into the voting stage and proposal generation stage.
- Score: 5.000768859809606
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: LiDAR-based 3D single object tracking is a challenging issue in robotics and
autonomous driving. Currently, existing approaches usually suffer from the
problem that objects at long distance often have very sparse or
partially-occluded point clouds, which makes the features extracted by the
model ambiguous. Ambiguous features will make it hard to locate the target
object and finally lead to bad tracking results. To solve this problem, we
utilize the powerful Transformer architecture and propose a
Point-Track-Transformer (PTT) module for point cloud-based 3D single object
tracking task. Specifically, PTT module generates fine-tuned attention features
by computing attention weights, which guides the tracker focusing on the
important features of the target and improves the tracking ability in complex
scenarios. To evaluate our PTT module, we embed PTT into the dominant method
and construct a novel 3D SOT tracker named PTT-Net. In PTT-Net, we embed PTT
into the voting stage and proposal generation stage, respectively. PTT module
in the voting stage could model the interactions among point patches, which
learns context-dependent features. Meanwhile, PTT module in the proposal
generation stage could capture the contextual information between object and
background. We evaluate our PTT-Net on KITTI and NuScenes datasets.
Experimental results demonstrate the effectiveness of PTT module and the
superiority of PTT-Net, which surpasses the baseline by a noticeable margin,
~10% in the Car category. Meanwhile, our method also has a significant
performance improvement in sparse scenarios. In general, the combination of
transformer and tracking pipeline enables our PTT-Net to achieve
state-of-the-art performance on both two datasets. Additionally, PTT-Net could
run in real-time at 40FPS on NVIDIA 1080Ti GPU. Our code is open-sourced for
the research community at https://github.com/shanjiayao/PTT.
Related papers
- PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - PillarTrack: Redesigning Pillar-based Transformer Network for Single Object Tracking on Point Clouds [5.524413892353708]
LiDAR-based 3D single object tracking (3D SOT) is a critical issue in robotics and autonomous driving.
We propose PillarTrack, a pillar-based 3D single object tracking framework.
PillarTrack achieves state-of-the-art performance on the KITTI and nuScenes dataset and enables real-time tracking speed.
arXiv Detail & Related papers (2024-04-11T06:06:56Z) - PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection [66.94819989912823]
We propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection.
We use point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement.
We conduct extensive experiments on the large-scale dataset to demonstrate that our approach performs well against state-of-the-art methods.
arXiv Detail & Related papers (2023-12-13T18:59:13Z) - Position-guided Text Prompt for Vision-Language Pre-training [121.15494549650548]
We propose a novel Position-guided Text Prompt (PTP) paradigm to enhance the visual grounding ability of cross-modal models trained with Vision-Language Pre-Training.
PTP reformulates the visual grounding task into a fill-in-the-blank problem given a PTP by encouraging the model to predict the objects in the given blocks or regress the blocks of a given object.
PTP achieves comparable results with object-detector based methods, and much faster inference speed since PTP discards its object detector for inference while the later cannot.
arXiv Detail & Related papers (2022-12-19T18:55:43Z) - Minkowski Tracker: A Sparse Spatio-Temporal R-CNN for Joint Object
Detection and Tracking [53.64390261936975]
We present Minkowski Tracker, a sparse-temporal R-CNN that jointly solves object detection and tracking problems.
Inspired by region-based CNN (R-CNN), we propose to track motion as a second stage of the object detector R-CNN.
We show in large-scale experiments that the overall performance gain of our method is due to four factors.
arXiv Detail & Related papers (2022-08-22T04:47:40Z) - Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with
Transformer [62.68401838976208]
3D object tracking aims to predict the location and orientation of an object in consecutive frames given an object template.
Motivated by the success of transformers, we propose Point Tracking TRansformer (PTTR), which efficiently predicts high-quality 3D tracking results.
arXiv Detail & Related papers (2022-08-10T08:36:46Z) - PiFeNet: Pillar-Feature Network for Real-Time 3D Pedestrian Detection
from Point Cloud [64.12626752721766]
We present PiFeNet, an efficient real-time 3D detector for pedestrian detection from point clouds.
We address two challenges that 3D object detection frameworks encounter when detecting pedestrians: low of pillar features and small occupation areas of pedestrians in point clouds.
Our approach is ranked 1st in KITTI pedestrian BEV and 3D leaderboards while running at 26 frames per second (FPS), and achieves state-of-the-art performance on Nuscenes detection benchmark.
arXiv Detail & Related papers (2021-12-31T13:41:37Z) - PTTR: Relational 3D Point Cloud Object Tracking with Transformer [37.06516957454285]
In a point cloud sequence, 3D object tracking aims to predict the location and orientation of an object in the current search point cloud given a template point cloud.
We propose Point Tracking TRansformer (PTTR), which efficiently predicts high-quality 3D tracking results in a coarse-to-fine manner with the help of transformer operations.
arXiv Detail & Related papers (2021-12-06T08:28:05Z) - PTT: Point-Track-Transformer Module for 3D Single Object Tracking in
Point Clouds [7.482036504835097]
Point-Track-Transformer (PTT) for point cloud-based 3D single object tracking.
PTT module contains three blocks for feature embedding, position encoding, and self-attention feature.
Our PTT-Net surpasses the state-of-the-art by a noticeable margin (10%)
arXiv Detail & Related papers (2021-08-14T03:24:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.