PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning
- URL: http://arxiv.org/abs/2405.15214v2
- Date: Tue, 3 Sep 2024 06:40:37 GMT
- Title: PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning
- Authors: Qingdong He, Jiangning Zhang, Jinlong Peng, Haoyang He, Xiangtai Li, Yabiao Wang, Chengjie Wang,
- Abstract summary: We present PointRWKV, a model of linear complexity derived from the RWKV model in the NLP field.
We first propose to explore the global processing capabilities within PointRWKV blocks using modified multi-headed matrix-valued states.
To extract local geometric features simultaneously, we design a parallel branch to encode the point cloud efficiently in a fixed radius near-neighbors graph with a graph stabilizer.
- Score: 56.14518823931901
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers have revolutionized the point cloud learning task, but the quadratic complexity hinders its extension to long sequence and makes a burden on limited computational resources. The recent advent of RWKV, a fresh breed of deep sequence models, has shown immense potential for sequence modeling in NLP tasks. In this paper, we present PointRWKV, a model of linear complexity derived from the RWKV model in the NLP field with necessary modifications for point cloud learning tasks. Specifically, taking the embedded point patches as input, we first propose to explore the global processing capabilities within PointRWKV blocks using modified multi-headed matrix-valued states and a dynamic attention recurrence mechanism. To extract local geometric features simultaneously, we design a parallel branch to encode the point cloud efficiently in a fixed radius near-neighbors graph with a graph stabilizer. Furthermore, we design PointRWKV as a multi-scale framework for hierarchical feature learning of 3D point clouds, facilitating various downstream tasks. Extensive experiments on different point cloud learning tasks show our proposed PointRWKV outperforms the transformer- and mamba-based counterparts, while significantly saving about 42\% FLOPs, demonstrating the potential option for constructing foundational 3D models.
Related papers
- Dynamic 3D Point Cloud Sequences as 2D Videos [81.46246338686478]
3D point cloud sequences serve as one of the most common and practical representation modalities of real-world environments.
We propose a novel generic representation called textitStructured Point Cloud Videos (SPCVs)
SPCVs re-organizes a point cloud sequence as a 2D video with spatial smoothness and temporal consistency, where the pixel values correspond to the 3D coordinates of points.
arXiv Detail & Related papers (2024-03-02T08:18:57Z) - Adaptive Point Transformer [88.28498667506165]
Adaptive Point Cloud Transformer (AdaPT) is a standard PT model augmented by an adaptive token selection mechanism.
AdaPT dynamically reduces the number of tokens during inference, enabling efficient processing of large point clouds.
arXiv Detail & Related papers (2024-01-26T13:24:45Z) - PosDiffNet: Positional Neural Diffusion for Point Cloud Registration in
a Large Field of View with Perturbations [27.45001809414096]
PosDiffNet is a model for point cloud registration in 3D computer vision.
We leverage a graph neural partial differential equation (PDE) based on Beltrami flow to obtain high-dimensional features.
We employ the multi-level correspondence derived from the high feature similarity scores to facilitate alignment between point clouds.
We evaluate PosDiffNet on several 3D point cloud datasets, verifying that it achieves state-of-the-art (SOTA) performance for point cloud registration in large fields of view with perturbations.
arXiv Detail & Related papers (2024-01-06T08:58:15Z) - Point Cloud Pre-training with Diffusion Models [62.12279263217138]
We propose a novel pre-training method called Point cloud Diffusion pre-training (PointDif)
PointDif achieves substantial improvement across various real-world datasets for diverse downstream tasks such as classification, segmentation and detection.
arXiv Detail & Related papers (2023-11-25T08:10:05Z) - AdaPoinTr: Diverse Point Cloud Completion with Adaptive Geometry-Aware
Transformers [94.11915008006483]
We present a new method that reformulates point cloud completion as a set-to-set translation problem.
We design a new model, called PoinTr, which adopts a Transformer encoder-decoder architecture for point cloud completion.
Our method attains 6.53 CD on PCN, 0.81 CD on ShapeNet-55 and 0.392 MMD on real-world KITTI.
arXiv Detail & Related papers (2023-01-11T16:14:12Z) - CpT: Convolutional Point Transformer for 3D Point Cloud Processing [10.389972581905]
We present CpT: Convolutional point Transformer - a novel deep learning architecture for dealing with the unstructured nature of 3D point cloud data.
CpT is an improvement over existing attention-based Convolutions Neural Networks as well as previous 3D point cloud processing transformers.
Our model can serve as an effective backbone for various point cloud processing tasks when compared to the existing state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-21T17:45:55Z) - Planning with Learned Dynamic Model for Unsupervised Point Cloud
Registration [25.096635750142227]
We develop a latent dynamic model of point clouds, consisting of a transformation network and evaluation network.
We employ the cross-entropy method (CEM) to iteratively update the planning policy by maximizing the rewards in the point cloud registration process.
Experimental results on ModelNet40 and 7Scene benchmark datasets demonstrate that our method can yield good registration performance in an unsupervised manner.
arXiv Detail & Related papers (2021-08-05T13:47:11Z) - DV-ConvNet: Fully Convolutional Deep Learning on Point Clouds with
Dynamic Voxelization and 3D Group Convolution [0.7340017786387767]
3D point cloud interpretation is a challenging task due to the randomness and sparsity of the component points.
In this work, we draw attention back to the standard 3D convolutions towards an efficient 3D point cloud interpretation.
Our network is able to run and converge at a considerably fast speed, while yields on-par or even better performance compared with the state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2020-09-07T07:45:05Z) - Pseudo-LiDAR Point Cloud Interpolation Based on 3D Motion Representation
and Spatial Supervision [68.35777836993212]
We propose a Pseudo-LiDAR point cloud network to generate temporally and spatially high-quality point cloud sequences.
By exploiting the scene flow between point clouds, the proposed network is able to learn a more accurate representation of the 3D spatial motion relationship.
arXiv Detail & Related papers (2020-06-20T03:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.