Related papers: LitePT: Lighter Yet Stronger Point Transformer

LitePT: Lighter Yet Stronger Point Transformer

URL: http://arxiv.org/abs/2512.13689v1
Date: Mon, 15 Dec 2025 18:59:57 GMT
Title: LitePT: Lighter Yet Stronger Point Transformer
Authors: Yuanwen Yue, Damien Robert, Jianyuan Wang, Sunghwan Hong, Jan Dirk Wegner, Christian Rupprecht, Konrad Schindler,
Abstract summary: We analyse the role of different computational blocks in 3D point cloud networks.<n>We propose a new, improved 3D point cloud backbone that employs convolutions in early stages and switches to attention for deeper layers.<n>The resulting LitePT model has $3.6times$ fewer parameters, runs $2times$ faster, and uses $2times$ less memory than the state-of-the-art Point Transformer V3.
Score: 50.6430530112838
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern neural architectures for 3D point cloud processing contain both convolutional layers and attention blocks, but the best way to assemble them remains unclear. We analyse the role of different computational blocks in 3D point cloud networks and find an intuitive behaviour: convolution is adequate to extract low-level geometry at high-resolution in early layers, where attention is expensive without bringing any benefits; attention captures high-level semantics and context in low-resolution, deep layers more efficiently. Guided by this design principle, we propose a new, improved 3D point cloud backbone that employs convolutions in early stages and switches to attention for deeper layers. To avoid the loss of spatial layout information when discarding redundant convolution layers, we introduce a novel, training-free 3D positional encoding, PointROPE. The resulting LitePT model has $3.6\times$ fewer parameters, runs $2\times$ faster, and uses $2\times$ less memory than the state-of-the-art Point Transformer V3, but nonetheless matches or even outperforms it on a range of tasks and datasets. Code and models are available at: https://github.com/prs-eth/LitePT.

Related papers

PointCNN++: Performant Convolution on Native Points [25.82514121801553]
Existing convolutional learning methods for 3D point cloud data are divided into two paradigms.<n> point-based methods preserve geometric precision but often face performance challenges.<n>voxel-based methods achieve high efficiency through quantization at the cost of geometric fidelity.<n>We propose PointCNN++, a novel architectural design that fundamentally mitigates this precision-performance trade-off.
arXiv Detail & Related papers (2025-11-28T14:35:35Z)
PFGS: High Fidelity Point Cloud Rendering via Feature Splatting [5.866747029417274]
We propose a novel framework to render high-quality images from sparse points. This method first attempts to bridge the 3D Gaussian Splatting and point cloud rendering. Experiments on different benchmarks show the superiority of our method in terms of rendering qualities and the necessities of our main components.
arXiv Detail & Related papers (2024-07-04T11:42:54Z)
DM3D: Distortion-Minimized Weight Pruning for Lossless 3D Object Detection [42.07920565812081]
We propose a novel post-training weight pruning scheme for 3D object detection. It determines redundant parameters in the pretrained model that lead to minimal distortion in both locality and confidence. This framework aims to minimize detection distortion of network output to maximally maintain detection precision.
arXiv Detail & Related papers (2024-07-02T09:33:32Z)
DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions [41.55908366474901]
We introduce a novel approach that harnesses both 2D and 3D attentions to enable highly accurate depth completion. We evaluate our method, DeCoTR, on established depth completion benchmarks.
arXiv Detail & Related papers (2024-03-18T19:22:55Z)
ConDaFormer: Disassembled Transformer with Local Structure Enhancement for 3D Point Cloud Understanding [105.98609765389895]
Transformers have been recently explored for 3D point cloud understanding. A large number of points, over 0.1 million, make the global self-attention infeasible for point cloud data. In this paper, we develop a new transformer block, named ConDaFormer.
arXiv Detail & Related papers (2023-12-18T11:19:45Z)
Using a Waffle Iron for Automotive Point Cloud Semantic Segmentation [66.6890991207065]
Sparse 3D convolutions have become the de-facto tools to construct deep neural networks. We propose an alternative method that reaches the level of state-of-the-art methods without requiring sparse convolutions. We show that such level of performance is achievable by relying on tools a priori unfit for large scale and high-performing 3D perception.
arXiv Detail & Related papers (2023-01-24T16:10:08Z)
Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology. Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z)
Deep Point Cloud Reconstruction [74.694733918351]
Point cloud obtained from 3D scanning is often sparse, noisy, and irregular. To cope with these issues, recent studies have been separately conducted to densify, denoise, and complete inaccurate point cloud. We propose a deep point cloud reconstruction network consisting of two stages: 1) a 3D sparse stacked-hourglass network as for the initial densification and denoising, 2) a refinement via transformers converting the discrete voxels into 3D points.
arXiv Detail & Related papers (2021-11-23T07:53:28Z)
FatNet: A Feature-attentive Network for 3D Point Cloud Processing [1.502579291513768]
We introduce a novel feature-attentive neural network layer, a FAT layer, that combines both global point-based features and local edge-based features in order to generate better embeddings. Our architecture achieves state-of-the-art results on the task of point cloud classification, as demonstrated on the ModelNet40 dataset.
arXiv Detail & Related papers (2021-04-07T23:13:56Z)
GRNet: Gridding Residual Network for Dense Point Cloud Completion [54.43648460932248]
Estimating the complete 3D point cloud from an incomplete one is a key problem in many vision and robotics applications. We propose a novel Gridding Residual Network (GRNet) for point cloud completion. Experimental results indicate that the proposed GRNet performs favorably against state-of-the-art methods on the ShapeNet, Completion3D, and KITTI benchmarks.
arXiv Detail & Related papers (2020-06-06T02:46:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.