Related papers: Real-time 3D Single Object Tracking with Transformer

Real-time 3D Single Object Tracking with Transformer

URL: http://arxiv.org/abs/2209.00860v1
Date: Fri, 2 Sep 2022 07:36:20 GMT
Title: Real-time 3D Single Object Tracking with Transformer
Authors: Jiayao Shan, Sifan Zhou, Yubo Cui, Zheng Fang
Abstract summary: Point-Track-Transformer (PTT) module for point cloud-based 3D single object tracking task. PTT module generates fine-tuned attention features by computing attention weights. In PTT-Net, we embed PTT into the voting stage and proposal generation stage.
Score: 5.000768859809606
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LiDAR-based 3D single object tracking is a challenging issue in robotics and autonomous driving. Currently, existing approaches usually suffer from the problem that objects at long distance often have very sparse or partially-occluded point clouds, which makes the features extracted by the model ambiguous. Ambiguous features will make it hard to locate the target object and finally lead to bad tracking results. To solve this problem, we utilize the powerful Transformer architecture and propose a Point-Track-Transformer (PTT) module for point cloud-based 3D single object tracking task. Specifically, PTT module generates fine-tuned attention features by computing attention weights, which guides the tracker focusing on the important features of the target and improves the tracking ability in complex scenarios. To evaluate our PTT module, we embed PTT into the dominant method and construct a novel 3D SOT tracker named PTT-Net. In PTT-Net, we embed PTT into the voting stage and proposal generation stage, respectively. PTT module in the voting stage could model the interactions among point patches, which learns context-dependent features. Meanwhile, PTT module in the proposal generation stage could capture the contextual information between object and background. We evaluate our PTT-Net on KITTI and NuScenes datasets. Experimental results demonstrate the effectiveness of PTT module and the superiority of PTT-Net, which surpasses the baseline by a noticeable margin, ~10% in the Car category. Meanwhile, our method also has a significant performance improvement in sparse scenarios. In general, the combination of transformer and tracking pipeline enables our PTT-Net to achieve state-of-the-art performance on both two datasets. Additionally, PTT-Net could run in real-time at 40FPS on NVIDIA 1080Ti GPU. Our code is open-sourced for the research community at https://github.com/shanjiayao/PTT.

Related papers

IMM-MOT: A Novel 3D Multi-object Tracking Framework with Interacting Multiple Model Filter [10.669576499007139]
3D Multi-Object Tracking (MOT) provides the trajectories of surrounding objects.<n>Existing 3D MOT methods based on the Tracking-by-Detection framework typically use a single motion model to track an object.<n>We introduce the Interacting Multiple Model filter in IMM-MOT, which accurately fits the complex motion patterns of individual objects.
arXiv Detail & Related papers (2025-02-13T01:55:32Z)
PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection. We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN) PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z)
PillarTrack: Redesigning Pillar-based Transformer Network for Single Object Tracking on Point Clouds [5.524413892353708]
LiDAR-based 3D single object tracking (3D SOT) is a critical issue in robotics and autonomous driving. We propose PillarTrack, a pillar-based 3D single object tracking framework. PillarTrack achieves state-of-the-art performance on the KITTI and nuScenes dataset and enables real-time tracking speed.
arXiv Detail & Related papers (2024-04-11T06:06:56Z)
PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection [66.94819989912823]
We propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection. We use point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement. We conduct extensive experiments on the large-scale dataset to demonstrate that our approach performs well against state-of-the-art methods.
arXiv Detail & Related papers (2023-12-13T18:59:13Z)
Position-guided Text Prompt for Vision-Language Pre-training [121.15494549650548]
We propose a novel Position-guided Text Prompt (PTP) paradigm to enhance the visual grounding ability of cross-modal models trained with Vision-Language Pre-Training. PTP reformulates the visual grounding task into a fill-in-the-blank problem given a PTP by encouraging the model to predict the objects in the given blocks or regress the blocks of a given object. PTP achieves comparable results with object-detector based methods, and much faster inference speed since PTP discards its object detector for inference while the later cannot.
arXiv Detail & Related papers (2022-12-19T18:55:43Z)
Minkowski Tracker: A Sparse Spatio-Temporal R-CNN for Joint Object Detection and Tracking [53.64390261936975]
We present Minkowski Tracker, a sparse-temporal R-CNN that jointly solves object detection and tracking problems. Inspired by region-based CNN (R-CNN), we propose to track motion as a second stage of the object detector R-CNN. We show in large-scale experiments that the overall performance gain of our method is due to four factors.
arXiv Detail & Related papers (2022-08-22T04:47:40Z)
Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with Transformer [62.68401838976208]
3D object tracking aims to predict the location and orientation of an object in consecutive frames given an object template. Motivated by the success of transformers, we propose Point Tracking TRansformer (PTTR), which efficiently predicts high-quality 3D tracking results.
arXiv Detail & Related papers (2022-08-10T08:36:46Z)
PiFeNet: Pillar-Feature Network for Real-Time 3D Pedestrian Detection from Point Cloud [64.12626752721766]
We present PiFeNet, an efficient real-time 3D detector for pedestrian detection from point clouds. We address two challenges that 3D object detection frameworks encounter when detecting pedestrians: low of pillar features and small occupation areas of pedestrians in point clouds. Our approach is ranked 1st in KITTI pedestrian BEV and 3D leaderboards while running at 26 frames per second (FPS), and achieves state-of-the-art performance on Nuscenes detection benchmark.
arXiv Detail & Related papers (2021-12-31T13:41:37Z)
PTTR: Relational 3D Point Cloud Object Tracking with Transformer [37.06516957454285]
In a point cloud sequence, 3D object tracking aims to predict the location and orientation of an object in the current search point cloud given a template point cloud. We propose Point Tracking TRansformer (PTTR), which efficiently predicts high-quality 3D tracking results in a coarse-to-fine manner with the help of transformer operations.
arXiv Detail & Related papers (2021-12-06T08:28:05Z)
PTT: Point-Track-Transformer Module for 3D Single Object Tracking in Point Clouds [7.482036504835097]
Point-Track-Transformer (PTT) for point cloud-based 3D single object tracking. PTT module contains three blocks for feature embedding, position encoding, and self-attention feature. Our PTT-Net surpasses the state-of-the-art by a noticeable margin (10%)
arXiv Detail & Related papers (2021-08-14T03:24:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.