PillarTrack:Boosting Pillar Representation for Transformer-based 3D Single Object Tracking on Point Clouds
- URL: http://arxiv.org/abs/2404.07495v2
- Date: Sun, 18 May 2025 13:40:12 GMT
- Title: PillarTrack:Boosting Pillar Representation for Transformer-based 3D Single Object Tracking on Point Clouds
- Authors: Weisheng Xu, Sifan Zhou, Jiaqi Xiong, Ziyu Zhao, Zhihang Yuan,
- Abstract summary: LiDAR-based 3D single object tracking (3D SOT) is a critical issue in robotics and autonomous driving.<n>We propose PillarTrack, a novel pillar-based 3D SOT framework.<n>Our method achieves comparable performance on the KITTI and NuScenes datasets.
- Score: 6.478734561409898
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LiDAR-based 3D single object tracking (3D SOT) is a critical issue in robotics and autonomous driving. Existing 3D SOT methods typically adhere to a point-based processing pipeline, wherein the re-sampling operation invariably leads to either redundant or missing information, thereby impacting performance. To address these issues, we propose PillarTrack, a novel pillar-based 3D SOT framework. First, we transform sparse point clouds into dense pillars to preserve the local and global geometrics. Second, we propose a Pyramid-Encoded Pillar Feature Encoder (PE-PFE) design to enhance the robustness of pillar feature for translation/rotation/scale. Third, we present an efficient Transformer-based backbone from the perspective of modality differences. Finally, we construct our PillarTrack based on above designs. Extensive experiments show that our method achieves comparable performance on the KITTI and NuScenes datasets, significantly enhancing the performance of the baseline.
Related papers
- Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - TAPVid-3D: A Benchmark for Tracking Any Point in 3D [63.060421798990845]
We introduce a new benchmark, TAPVid-3D, for evaluating the task of Tracking Any Point in 3D.
This benchmark will serve as a guidepost to improve our ability to understand precise 3D motion and surface deformation from monocular video.
arXiv Detail & Related papers (2024-07-08T13:28:47Z) - FastPillars: A Deployment-friendly Pillar-based 3D Detector [63.0697065653061]
Existing BEV-based (i.e., Bird Eye View) detectors favor sparse convolutions (known as SPConv) to speed up training and inference.
FastPillars delivers state-of-the-art accuracy on Open dataset with 1.8X speed up and 3.8 mAPH/L2 improvement over CenterPoint (SPConv-based)
arXiv Detail & Related papers (2023-02-05T12:13:27Z) - Exploiting More Information in Sparse Point Cloud for 3D Single Object
Tracking [9.693724357115762]
3D single object tracking is a key task in 3D computer vision.
The sparsity of point clouds makes it difficult to compute the similarity and locate the object.
We propose a sparse-to-dense and transformer-based framework for 3D single object tracking.
arXiv Detail & Related papers (2022-10-02T13:38:30Z) - Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with
Transformer [62.68401838976208]
3D object tracking aims to predict the location and orientation of an object in consecutive frames given an object template.
Motivated by the success of transformers, we propose Point Tracking TRansformer (PTTR), which efficiently predicts high-quality 3D tracking results.
arXiv Detail & Related papers (2022-08-10T08:36:46Z) - VPIT: Real-time Embedded Single Object 3D Tracking Using Voxel Pseudo Images [90.60881721134656]
We propose a novel voxel-based 3D single object tracking (3D SOT) method called Voxel Pseudo Image Tracking (VPIT)
Experiments on KITTI Tracking dataset show that VPIT is the fastest 3D SOT method and maintains competitive Success and Precision values.
arXiv Detail & Related papers (2022-06-06T14:02:06Z) - A Lightweight and Detector-free 3D Single Object Tracker on Point Clouds [50.54083964183614]
It is non-trivial to perform accurate target-specific detection since the point cloud of objects in raw LiDAR scans is usually sparse and incomplete.
We propose DMT, a Detector-free Motion prediction based 3D Tracking network that totally removes the usage of complicated 3D detectors.
arXiv Detail & Related papers (2022-03-08T17:49:07Z) - Geometry-Contrastive Transformer for Generalized 3D Pose Transfer [95.56457218144983]
The intuition of this work is to perceive the geometric inconsistency between the given meshes with the powerful self-attention mechanism.
We propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies.
We present a latent isometric regularization module together with a novel semi-synthesized dataset for the cross-dataset 3D pose transfer task.
arXiv Detail & Related papers (2021-12-14T13:14:24Z) - Improved Pillar with Fine-grained Feature for 3D Object Detection [23.348710029787068]
3D object detection with LiDAR point clouds plays an important role in autonomous driving perception module.
Existing point-based methods are challenging to reach the speed requirements because of too many raw points.
The 2D grid-based methods, such as PointPillar, can easily achieve a stable and efficient speed based on simple 2D convolution.
arXiv Detail & Related papers (2021-10-12T14:53:14Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - 3D-SiamRPN: An End-to-End Learning Method for Real-Time 3D Single Object
Tracking Using Raw Point Cloud [9.513194898261787]
We propose a 3D tracking method called 3D-SiamRPN Network to track a single target object by using raw 3D point cloud data.
Experimental results on KITTI dataset show that our method has a competitive performance in both Success and Precision.
arXiv Detail & Related papers (2021-08-12T09:52:28Z) - DV-Det: Efficient 3D Point Cloud Object Detection with Dynamic
Voxelization [0.0]
We propose a novel two-stage framework for the efficient 3D point cloud object detection.
We parse the raw point cloud data directly in the 3D space yet achieve impressive efficiency and accuracy.
We highlight our KITTI 3D object detection dataset with 75 FPS and on Open dataset with 25 FPS inference speed with satisfactory accuracy.
arXiv Detail & Related papers (2021-07-27T10:07:39Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - F-Siamese Tracker: A Frustum-based Double Siamese Network for 3D Single
Object Tracking [12.644452175343059]
A main challenge in 3D single object tracking is how to reduce search space for generating appropriate 3D candidates.
Instead of relying on 3D proposals, we produce 2D region proposals which are then extruded into 3D viewing frustums.
We perform an online accuracy validation on the 3D frustum to generate refined point cloud searching space.
arXiv Detail & Related papers (2020-10-22T08:01:17Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.