Related papers: FractalCloud: A Fractal-Inspired Architecture for Efficient Large-Scale Point Cloud Processing

FractalCloud: A Fractal-Inspired Architecture for Efficient Large-Scale Point Cloud Processing

URL: http://arxiv.org/abs/2511.07665v1
Date: Wed, 12 Nov 2025 01:10:05 GMT
Title: FractalCloud: A Fractal-Inspired Architecture for Efficient Large-Scale Point Cloud Processing
Authors: Yuzhe Fu, Changchun Zhou, Hancheng Ye, Bowen Duan, Qiyu Huang, Chiyue Wei, Cong Guo, Hai "Helen'' Li, Yiran Chen,
Abstract summary: Three-dimensional (3D) point clouds are increasingly used in applications such as autonomous driving, robotics, and virtual reality (VR)<n>Point-based neural networks (PNNs) have demonstrated strong performance in point cloud analysis, originally targeting small-scale inputs.<n>FractalCloud is a fractal-inspired hardware architecture for efficient large-scale 3D point cloud processing.
Score: 13.217596969807062
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Three-dimensional (3D) point clouds are increasingly used in applications such as autonomous driving, robotics, and virtual reality (VR). Point-based neural networks (PNNs) have demonstrated strong performance in point cloud analysis, originally targeting small-scale inputs. However, as PNNs evolve to process large-scale point clouds with hundreds of thousands of points, all-to-all computation and global memory access in point cloud processing introduce substantial overhead, causing $O(n^2)$ computational complexity and memory traffic where n is the number of points}. Existing accelerators, primarily optimized for small-scale workloads, overlook this challenge and scale poorly due to inefficient partitioning and non-parallel architectures. To address these issues, we propose FractalCloud, a fractal-inspired hardware architecture for efficient large-scale 3D point cloud processing. FractalCloud introduces two key optimizations: (1) a co-designed Fractal method for shape-aware and hardware-friendly partitioning, and (2) block-parallel point operations that decompose and parallelize all point operations. A dedicated hardware design with on-chip fractal and flexible parallelism further enables fully parallel processing within limited memory resources. Implemented in 28 nm technology as a chip layout with a core area of 1.5 $mm^2$, FractalCloud achieves 21.7x speedup and 27x energy reduction over state-of-the-art accelerators while maintaining network accuracy, demonstrating its scalability and efficiency for PNN inference.

Related papers

LitePT: Lighter Yet Stronger Point Transformer [50.6430530112838]
We analyse the role of different computational blocks in 3D point cloud networks.<n>We propose a new, improved 3D point cloud backbone that employs convolutions in early stages and switches to attention for deeper layers.<n>The resulting LitePT model has $3.6times$ fewer parameters, runs $2times$ faster, and uses $2times$ less memory than the state-of-the-art Point Transformer V3.
arXiv Detail & Related papers (2025-12-15T18:59:57Z)
FAMOUS: Flexible Accelerator for the Attention Mechanism of Transformer on UltraScale+ FPGAs [0.0]
Transformer neural networks (TNNs) are being applied across a widening range of application domains, including natural language processing (NLP), machine translation, and computer vision (CV)<n>This paper proposes textitFAMOUS, a flexible hardware accelerator for dense multi-head attention computation of TNNs on field-programmable gate arrays (FPGAs)<n>It is optimized for high utilization of processing elements and on-chip memories to improve parallelism and reduce latency.
arXiv Detail & Related papers (2024-09-21T05:25:46Z)
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity [12.663030430488922]
We propose Flash-LLM for enabling low-cost and highly-efficient large generative model inference on high-performance Cores. At SpMM kernel level, Flash-LLM significantly outperforms the state-of-the-art library, i.e., Sputnik and SparTA by an average of 2.9x and 1.5x, respectively.
arXiv Detail & Related papers (2023-09-19T03:20:02Z)
DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets [95.84755169585492]
We present Dynamic Sparse Voxel Transformer (DSVT), a single-stride window-based voxel Transformer backbone for outdoor 3D perception. Our model achieves state-of-the-art performance with a broad range of 3D perception tasks.
arXiv Detail & Related papers (2023-01-15T09:31:58Z)
CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation. We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration. The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z)
POEM: 1-bit Point-wise Operations based on Expectation-Maximization for Efficient Point Cloud Processing [53.74076015905961]
We introduce point-wise operations based on Expectation-Maximization into BNNs for efficient point cloud processing. Our POEM surpasses existing the state-of-the-art binary point cloud networks by a significant margin, up to 6.7 %.
arXiv Detail & Related papers (2021-11-26T09:45:01Z)
Phantom: A High-Performance Computational Core for Sparse Convolutional Neural Networks [3.198144010381572]
Sparse convolutional neural networks (CNNs) have gained significant traction over the past few years. They can drastically decrease the model size and computations, if exploited befittingly, as compared to their dense counterparts. Recently proposed sparse accelerators like SCNN, Eyeriss v2, and SparTen, actively exploit the two-sided or full sparsity, that is, sparsity in both weights and activations, for performance gains. These accelerators either have inefficient micro-architecture, which limits their performance, have no support for non-unit stride convolutions and fully-connected layers, or suffer
arXiv Detail & Related papers (2021-11-09T08:43:03Z)
Dynamic Convolution for 3D Point Cloud Instance Segmentation [146.7971476424351]
We propose an approach to instance segmentation from 3D point clouds based on dynamic convolution. We gather homogeneous points that have identical semantic categories and close votes for the geometric centroids. The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.
arXiv Detail & Related papers (2021-07-18T09:05:16Z)
Learning Semantic Segmentation of Large-Scale Point Clouds with Random Sampling [52.464516118826765]
We introduce RandLA-Net, an efficient and lightweight neural architecture to infer per-point semantics for large-scale point clouds. The key to our approach is to use random point sampling instead of more complex point selection approaches. Our RandLA-Net can process 1 million points in a single pass up to 200x faster than existing approaches.
arXiv Detail & Related papers (2021-07-06T05:08:34Z)
DeepCompress: Efficient Point Cloud Geometry Compression [1.808877001896346]
We propose a more efficient deep learning-based encoder architecture for point clouds compression. We show that incorporating the learned activation function from Efficient Neural Image Compression (CENIC) yields dramatic gains in efficiency and performance. Our proposed modifications outperform the baseline approaches by a small margin in terms of Bjontegard delta rate and PSNR values.
arXiv Detail & Related papers (2021-06-02T23:18:11Z)
Efficient and Generic 1D Dilated Convolution Layer for Deep Learning [52.899995651639436]
We introduce our efficient implementation of a generic 1D convolution layer covering a wide range of parameters. It is optimized for x86 CPU architectures, in particular, for architectures containing Intel AVX-512 and AVX-512 BFloat16 instructions. We demonstrate the performance of our optimized 1D convolution layer by utilizing it in the end-to-end neural network training with real genomics datasets.
arXiv Detail & Related papers (2021-04-16T09:54:30Z)
SparsePipe: Parallel Deep Learning for 3D Point Clouds [7.181267620981419]
SparsePipe is built to support 3D sparse data such as point clouds. It exploits intra-batch parallelism that partitions input data into multiple processors. We show that SparsePipe can parallelize effectively and obtain better performance on current point cloud benchmarks.
arXiv Detail & Related papers (2020-12-27T01:47:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.