Efficient Point Cloud Processing with High-Dimensional Positional Encoding and Non-Local MLPs
- URL: http://arxiv.org/abs/2603.04099v1
- Date: Wed, 04 Mar 2026 14:12:13 GMT
- Title: Efficient Point Cloud Processing with High-Dimensional Positional Encoding and Non-Local MLPs
- Authors: Yanmei Zou, Hongshan Yu, Yaonan Wang, Zhengeng Yang, Xieyuanli Chen, Kailun Yang, Naveed Akhtar,
- Abstract summary: We develop a two-stage abstraction and refinement (ABSREF) view for modular feature extraction in point cloud processing.<n>We propose a High-stage Positional (HPE) module to explicitly utilize positional information.<n>Within our ABSREF view, we rethink local aggregation in relationships and propose replacing time-consuming local operations.
- Score: 68.55902504866422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-Layer Perceptron (MLP) models are the foundation of contemporary point cloud processing. However, their complex network architectures obscure the source of their strength and limit the application of these models. In this article, we develop a two-stage abstraction and refinement (ABS-REF) view for modular feature extraction in point cloud processing. This view elucidates that whereas the early models focused on ABS stages, the more recent techniques devise sophisticated REF stages to attain performance advantages. Then, we propose a High-dimensional Positional Encoding (HPE) module to explicitly utilize intrinsic positional information, extending the ``positional encoding'' concept from Transformer literature. HPE can be readily deployed in MLP-based architectures and is compatible with transformer-based methods. Within our ABS-REF view, we rethink local aggregation in MLP-based methods and propose replacing time-consuming local MLP operations, which are used to capture local relationships among neighbors. Instead, we use non-local MLPs for efficient non-local information updates, combined with the proposed HPE for effective local information representation. We leverage our modules to develop HPENets, a suite of MLP networks that follow the ABS-REF paradigm, incorporating a scalable HPE-based REF stage. Extensive experiments on seven public datasets across four different tasks show that HPENets deliver a strong balance between efficiency and effectiveness. Notably, HPENet surpasses PointNeXt, a strong MLP-based counterpart, by 1.1% mAcc, 4.0% mIoU, 1.8% mIoU, and 0.2% Cls. mIoU, with only 50.0%, 21.5%, 23.1%, 44.4% of FLOPs on ScanObjectNN, S3DIS, ScanNet, and ShapeNetPart, respectively. Source code is available at https://github.com/zouyanmei/HPENet_v2.git.
Related papers
- Strip-MLP: Efficient Token Interaction for Vision MLP [31.02197585697145]
We introduce textbfStrip-MLP to enrich the token interaction power in three ways.
Strip-MLP significantly improves the performance of spatial-based models on small datasets.
Models achieve higher average Top-1 accuracy than existing datasets by +2.44% on Caltech-101 and +2.16% on CIFAR-100.
arXiv Detail & Related papers (2023-07-21T09:40:42Z) - MLP Fusion: Towards Efficient Fine-tuning of Dense and Mixture-of-Experts Language Models [33.86069537521178]
Fine-tuning a pre-trained language model (PLM) emerges as the predominant strategy in many natural language processing applications.<n>General approaches (e.g. quantization and distillation) have been widely studied to reduce the compute/memory of PLM fine-tuning.<n>We propose one-shot compression techniques specifically designed for fine-tuning.
arXiv Detail & Related papers (2023-07-18T03:12:51Z) - Rethinking Network Design and Local Geometry in Point Cloud: A Simple
Residual MLP Framework [55.40001810884942]
We introduce a pure residual network, called PointMLP, which integrates no sophisticated local geometrical extractors but still performs very competitively.
On the real-world ScanObjectNN dataset, our method even surpasses the prior best method by 3.3% accuracy.
Compared to most recent CurveNet, PointMLP trains 2x faster, tests 7x faster, and is more accurate on ModelNet40 benchmark.
arXiv Detail & Related papers (2022-02-15T01:39:07Z) - RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality [113.1414517605892]
We propose a methodology, Locality Injection, to incorporate local priors into an FC layer.
RepMLPNet is the first that seamlessly transfer to Cityscapes semantic segmentation.
arXiv Detail & Related papers (2021-12-21T10:28:17Z) - Sparse-MLP: A Fully-MLP Architecture with Conditional Computation [7.901786481399378]
Mixture-of-Experts (MoE) with sparse conditional computation has been proved an effective architecture for scaling attention-based models to more parameters with comparable computation cost.
We propose Sparse-MLP, scaling the recent-Mixer model with MoE, to achieve a more-efficient architecture.
arXiv Detail & Related papers (2021-09-05T06:43:08Z) - Hire-MLP: Vision MLP via Hierarchical Rearrangement [58.33383667626998]
Hire-MLP is a simple yet competitive vision architecture via rearrangement.
The proposed Hire-MLP architecture is built with simple channel-mixing operations, thus enjoys high flexibility and inference speed.
Experiments show that our Hire-MLP achieves state-of-the-art performance on the ImageNet-1K benchmark.
arXiv Detail & Related papers (2021-08-30T16:11:04Z) - AS-MLP: An Axial Shifted MLP Architecture for Vision [50.11765148947432]
An Axial Shifted architecture (AS-MLP) is proposed in this paper.
By axially shifting channels of the feature map, AS-MLP is able to obtain the information flow from different directions.
With the proposed AS-MLP architecture, our model obtains 83.3% Top-1 accuracy with 88M parameters and 15.2 GFLOPs on the ImageNet-1K dataset.
arXiv Detail & Related papers (2021-07-18T08:56:34Z) - RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for
Image Recognition [123.59890802196797]
We propose RepMLP, a multi-layer-perceptron-style neural network building block for image recognition.
We construct convolutional layers inside a RepMLP during training and merge them into the FC for inference.
By inserting RepMLP in traditional CNN, we improve ResNets by 1.8% accuracy on ImageNet, 2.9% for face recognition, and 2.3% mIoU on Cityscapes with lower FLOPs.
arXiv Detail & Related papers (2021-05-05T06:17:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.