Exploiting More Information in Sparse Point Cloud for 3D Single Object
Tracking
- URL: http://arxiv.org/abs/2210.00519v1
- Date: Sun, 2 Oct 2022 13:38:30 GMT
- Title: Exploiting More Information in Sparse Point Cloud for 3D Single Object
Tracking
- Authors: Yubo Cui, Jiayao Shan, Zuoxu Gu, Zhiheng Li, Zheng Fang
- Abstract summary: 3D single object tracking is a key task in 3D computer vision.
The sparsity of point clouds makes it difficult to compute the similarity and locate the object.
We propose a sparse-to-dense and transformer-based framework for 3D single object tracking.
- Score: 9.693724357115762
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D single object tracking is a key task in 3D computer vision. However, the
sparsity of point clouds makes it difficult to compute the similarity and
locate the object, posing big challenges to the 3D tracker. Previous works
tried to solve the problem and improved the tracking performance in some common
scenarios, but they usually failed in some extreme sparse scenarios, such as
for tracking objects at long distances or partially occluded. To address the
above problems, in this letter, we propose a sparse-to-dense and
transformer-based framework for 3D single object tracking. First, we transform
the 3D sparse points into 3D pillars and then compress them into 2D BEV
features to have a dense representation. Then, we propose an attention-based
encoder to achieve global similarity computation between template and search
branches, which could alleviate the influence of sparsity. Meanwhile, the
encoder applies the attention on multi-scale features to compensate for the
lack of information caused by the sparsity of point cloud and the single scale
of features. Finally, we use set-prediction to track the object through a
two-stage decoder which also utilizes attention. Extensive experiments show
that our method achieves very promising results on the KITTI and NuScenes
datasets.
Related papers
- PillarTrack: Redesigning Pillar-based Transformer Network for Single Object Tracking on Point Clouds [5.524413892353708]
LiDAR-based 3D single object tracking (3D SOT) is a critical issue in robotics and autonomous driving.
We propose PillarTrack, a pillar-based 3D single object tracking framework.
PillarTrack achieves state-of-the-art performance on the KITTI and nuScenes dataset and enables real-time tracking speed.
arXiv Detail & Related papers (2024-04-11T06:06:56Z) - EasyTrack: Efficient and Compact One-stream 3D Point Clouds Tracker [35.74677036815288]
We propose a neat and compact one-stream transformer 3D SOT paradigm, termed as textbfEasyTrack.
A 3D point clouds tracking feature pre-training module is developed to exploit the masked autoencoding for learning 3D point clouds tracking representations.
A target location network in the dense bird's eye view (BEV) feature space is constructed for target classification and regression.
arXiv Detail & Related papers (2024-04-09T02:47:52Z) - 3D Small Object Detection with Dynamic Spatial Pruning [62.72638845817799]
We propose an efficient feature pruning strategy for 3D small object detection.
We present a multi-level 3D detector named DSPDet3D which benefits from high spatial resolution.
It takes less than 2s to directly process a whole building consisting of more than 4500k points while detecting out almost all objects.
arXiv Detail & Related papers (2023-05-05T17:57:04Z) - Sparse2Dense: Learning to Densify 3D Features for 3D Object Detection [85.08249413137558]
LiDAR-produced point clouds are the major source for most state-of-the-art 3D object detectors.
Small, distant, and incomplete objects with sparse or few points are often hard to detect.
We present Sparse2Dense, a new framework to efficiently boost 3D detection performance by learning to densify point clouds in latent space.
arXiv Detail & Related papers (2022-11-23T16:01:06Z) - CXTrack: Improving 3D Point Cloud Tracking with Contextual Information [59.55870742072618]
3D single object tracking plays an essential role in many applications, such as autonomous driving.
We propose CXTrack, a novel transformer-based network for 3D object tracking.
We show that CXTrack achieves state-of-the-art tracking performance while running at 29 FPS.
arXiv Detail & Related papers (2022-11-12T11:29:01Z) - Scatter Points in Space: 3D Detection from Multi-view Monocular Images [8.71944437852952]
3D object detection from monocular image(s) is a challenging and long-standing problem of computer vision.
Recent methods tend to aggregate multiview feature by sampling regular 3D grid densely in space.
We propose a learnable keypoints sampling method, which scatters pseudo surface points in 3D space, in order to keep data sparsity.
arXiv Detail & Related papers (2022-08-31T09:38:05Z) - Embracing Single Stride 3D Object Detector with Sparse Transformer [63.179720817019096]
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases.
Many 3D detectors directly follow the common practice of 2D detectors, which downsample the feature maps even after quantizing the point clouds.
We propose Single-stride Sparse Transformer (SST) to maintain the original resolution from the beginning to the end of the network.
arXiv Detail & Related papers (2021-12-13T02:12:02Z) - Anchor-free 3D Single Stage Detector with Mask-Guided Attention for
Point Cloud [79.39041453836793]
We develop a novel single-stage 3D detector for point clouds in an anchor-free manner.
We overcome this by converting the voxel-based sparse 3D feature volumes into the sparse 2D feature maps.
We propose an IoU-based detection confidence re-calibration scheme to improve the correlation between the detection confidence score and the accuracy of the bounding box regression.
arXiv Detail & Related papers (2021-08-08T13:42:13Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - F-Siamese Tracker: A Frustum-based Double Siamese Network for 3D Single
Object Tracking [12.644452175343059]
A main challenge in 3D single object tracking is how to reduce search space for generating appropriate 3D candidates.
Instead of relying on 3D proposals, we produce 2D region proposals which are then extruded into 3D viewing frustums.
We perform an online accuracy validation on the 3D frustum to generate refined point cloud searching space.
arXiv Detail & Related papers (2020-10-22T08:01:17Z) - Center-based 3D Object Detection and Tracking [8.72305226979945]
Three-dimensional objects are commonly represented as 3D boxes in a point-cloud.
This representation mimics the well-studied image-based 2D bounding-box detection but comes with additional challenges.
In this paper, we propose to represent, detect, and track 3D objects as points.
Our framework, CenterPoint, first detects centers of objects using a keypoint detector and regresses to other attributes, including 3D size, 3D orientation, and velocity.
The resulting detection and tracking algorithm is simple, efficient, and effective.
arXiv Detail & Related papers (2020-06-19T17:59:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.