SSF: Sparse Long-Range Scene Flow for Autonomous Driving
- URL: http://arxiv.org/abs/2501.17821v1
- Date: Wed, 29 Jan 2025 18:14:16 GMT
- Title: SSF: Sparse Long-Range Scene Flow for Autonomous Driving
- Authors: Ajinkya Khoche, Qingwen Zhang, Laura Pereira Sanchez, Aron Asefaw, Sina Sharif Mansouri, Patric Jensfelt,
- Abstract summary: We propose a general pipeline for long-range scene flow, adopting a sparse convolution based backbone for feature extraction.
Our method, SSF, achieves state-of-the-art results on the Argoverse2 dataset, demonstrating strong performance in long-range scene flow estimation.
- Score: 4.685658373164552
- License:
- Abstract: Scene flow enables an understanding of the motion characteristics of the environment in the 3D world. It gains particular significance in the long-range, where object-based perception methods might fail due to sparse observations far away. Although significant advancements have been made in scene flow pipelines to handle large-scale point clouds, a gap remains in scalability with respect to long-range. We attribute this limitation to the common design choice of using dense feature grids, which scale quadratically with range. In this paper, we propose Sparse Scene Flow (SSF), a general pipeline for long-range scene flow, adopting a sparse convolution based backbone for feature extraction. This approach introduces a new challenge: a mismatch in size and ordering of sparse feature maps between time-sequential point scans. To address this, we propose a sparse feature fusion scheme, that augments the feature maps with virtual voxels at missing locations. Additionally, we propose a range-wise metric that implicitly gives greater importance to faraway points. Our method, SSF, achieves state-of-the-art results on the Argoverse2 dataset, demonstrating strong performance in long-range scene flow estimation. Our code will be released at https://github.com/KTH-RPL/SSF.git.
Related papers
- DeFlow: Decoder of Scene Flow Network in Autonomous Driving [19.486167661795797]
Scene flow estimation determines a scene's 3D motion field, by predicting the motion of points in the scene.
Many networks with large-scale point clouds as input use voxelization to create a pseudo-image for real-time running.
Our paper introduces DeFlow which enables a transition from voxel-based features to point features using Gated Recurrent Unit (GRU) refinement.
arXiv Detail & Related papers (2024-01-29T12:47:55Z) - PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection [66.94819989912823]
We propose a point-trajectory transformer with long short-term memory for efficient temporal 3D object detection.
We use point clouds of current-frame objects and their historical trajectories as input to minimize the memory bank storage requirement.
We conduct extensive experiments on the large-scale dataset to demonstrate that our approach performs well against state-of-the-art methods.
arXiv Detail & Related papers (2023-12-13T18:59:13Z) - GMSF: Global Matching Scene Flow [17.077134204089536]
We tackle the task of scene flow estimation from point clouds.
Given a source and a target point cloud, the objective is to estimate a translation from each point in the source point cloud to the target.
We propose a significantly simpler single-scale one-shot global matching to address the problem.
arXiv Detail & Related papers (2023-05-27T10:04:21Z) - Super Sparse 3D Object Detection [48.684300007948906]
LiDAR-based 3D object detection contributes ever-increasingly to the long-range perception in autonomous driving.
To enable efficient long-range detection, we first propose a fully sparse object detector termed FSD.
FSD++ generates residual points, which indicate the point changes between consecutive frames.
arXiv Detail & Related papers (2023-01-05T17:03:56Z) - Temporal Action Localization with Multi-temporal Scales [54.69057924183867]
We propose to predict actions on a feature space of multi-temporal scales.
Specifically, we use refined feature pyramids of different scales to pass semantics from high-level scales to low-level scales.
The proposed method can achieve improvements of 12.6%, 17.4% and 2.2%, respectively.
arXiv Detail & Related papers (2022-08-16T01:48:23Z) - Stratified Transformer for 3D Point Cloud Segmentation [89.9698499437732]
Stratified Transformer is able to capture long-range contexts and demonstrates strong generalization ability and high performance.
To combat the challenges posed by irregular point arrangements, we propose first-layer point embedding to aggregate local information.
Experiments demonstrate the effectiveness and superiority of our method on S3DIS, ScanNetv2 and ShapeNetPart datasets.
arXiv Detail & Related papers (2022-03-28T05:35:16Z) - POCO: Point Convolution for Surface Reconstruction [92.22371813519003]
Implicit neural networks have been successfully used for surface reconstruction from point clouds.
Many of them face scalability issues as they encode the isosurface function of a whole object or scene into a single latent vector.
We propose to use point cloud convolutions and compute latent vectors at each input point.
arXiv Detail & Related papers (2022-01-05T21:26:18Z) - SCTN: Sparse Convolution-Transformer Network for Scene Flow Estimation [71.2856098776959]
Estimating 3D motions for point clouds is challenging, since a point cloud is unordered and its density is significantly non-uniform.
We propose a novel architecture named Sparse Convolution-Transformer Network (SCTN) that equips the sparse convolution with the transformer.
We show that the learned relation-based contextual information is rich and helpful for matching corresponding points, benefiting scene flow estimation.
arXiv Detail & Related papers (2021-05-10T15:16:14Z) - FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point
Clouds [28.899804787744202]
Scene flow depicts the dynamics of a 3D scene, which is critical for various applications such as autonomous driving, robot navigation, AR/VR, etc.
It remains challenging to extract scene flow from point clouds due to sparsity and irregularity in typical point cloud sampling patterns.
A novel Spatial Abstraction with Attention (SA2) layer is proposed to alleviate the unstable abstraction problem.
A Temporal Abstraction with Attention (TA2) layer is proposed to rectify attention in temporal domain, leading to benefits with motions scaled in a larger range.
arXiv Detail & Related papers (2021-04-01T23:04:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.