Attention-Guided Multi-Scale Local Reconstruction for Point Clouds via Masked Autoencoder Self-Supervised Learning
- URL: http://arxiv.org/abs/2507.04084v1
- Date: Sat, 05 Jul 2025 16:17:49 GMT
- Title: Attention-Guided Multi-Scale Local Reconstruction for Point Clouds via Masked Autoencoder Self-Supervised Learning
- Authors: Xin Cao, Haoyu Wang, Yuzhu Mao, Xinda Liu, Linzhi Su, Kang Li,
- Abstract summary: We introduce PointAMaLR, a novel self-supervised learning framework for point cloud processing.<n>PointAMaLR implements hierarchical reconstruction across multiple local regions.<n>Experiments on benchmark datasets demonstrate PointAMaLR's superior accuracy and quality in both classification and reconstruction tasks.
- Score: 9.390627399366833
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning has emerged as a prominent research direction in point cloud processing. While existing models predominantly concentrate on reconstruction tasks at higher encoder layers, they often neglect the effective utilization of low-level local features, which are typically employed solely for activation computations rather than directly contributing to reconstruction tasks. To overcome this limitation, we introduce PointAMaLR, a novel self-supervised learning framework that enhances feature representation and processing accuracy through attention-guided multi-scale local reconstruction. PointAMaLR implements hierarchical reconstruction across multiple local regions, with lower layers focusing on fine-scale feature restoration while upper layers address coarse-scale feature reconstruction, thereby enabling complex inter-patch interactions. Furthermore, to augment feature representation capabilities, we incorporate a Local Attention (LA) module in the embedding layer to enhance semantic feature understanding. Comprehensive experiments on benchmark datasets ModelNet and ShapeNet demonstrate PointAMaLR's superior accuracy and quality in both classification and reconstruction tasks. Moreover, when evaluated on the real-world dataset ScanObjectNN and the 3D large scene segmentation dataset S3DIS, our model achieves highly competitive performance metrics. These results not only validate PointAMaLR's effectiveness in multi-scale semantic understanding but also underscore its practical applicability in real-world scenarios.
Related papers
- Topology-Aware Modeling for Unsupervised Simulation-to-Reality Point Cloud Recognition [63.55828203989405]
We introduce a novel Topology-Aware Modeling (TAM) framework for Sim2Real UDA on object point clouds.<n>Our approach mitigates the domain gap by leveraging global spatial topology, characterized by low-level, high-frequency 3D structures.<n>We propose an advanced self-training strategy that combines cross-domain contrastive learning with self-training.
arXiv Detail & Related papers (2025-06-26T11:53:59Z) - KAN or MLP? Point Cloud Shows the Way Forward [13.669234791655075]
We propose PointKAN, which applies Kolmogorov-Arnold Learning Networks (KANs) to point cloud analysis tasks.<n>We show that PointKAN outperforms PointMLP on benchmark datasets such as ModelNet40, ScanNN, and ShapeNetPart.<n>This work highlights the potential of KANs-based architectures in 3D vision and opens new avenues for research in point cloud understanding.
arXiv Detail & Related papers (2025-04-18T09:52:22Z) - Point Cloud Understanding via Attention-Driven Contrastive Learning [64.65145700121442]
Transformer-based models have advanced point cloud understanding by leveraging self-attention mechanisms.
PointACL is an attention-driven contrastive learning framework designed to address these limitations.
Our method employs an attention-driven dynamic masking strategy that guides the model to focus on under-attended regions.
arXiv Detail & Related papers (2024-11-22T05:41:00Z) - Point Tree Transformer for Point Cloud Registration [33.00645881490638]
Point cloud registration is a fundamental task in the fields of computer vision and robotics.
We propose a novel transformer-based approach for point cloud registration that efficiently extracts comprehensive local and global features.
Our method achieves superior performance over the state-of-the-art methods.
arXiv Detail & Related papers (2024-06-25T13:14:26Z) - DETR Doesn't Need Multi-Scale or Locality Design [69.56292005230185]
This paper presents an improved DETR detector that maintains a "plain" nature.
It uses a single-scale feature map and global cross-attention calculations without specific locality constraints.
We show that two simple technologies are surprisingly effective within a plain design to compensate for the lack of multi-scale feature maps and locality constraints.
arXiv Detail & Related papers (2023-08-03T17:59:04Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - LCTR: On Awakening the Local Continuity of Transformer for Weakly
Supervised Object Localization [38.376238216214524]
Weakly supervised object localization (WSOL) aims to learn object localizer solely by using image-level labels.
We propose a novel framework built upon the transformer, termed LCTR, which targets at enhancing the local perception capability of global features.
arXiv Detail & Related papers (2021-12-10T01:48:40Z) - Progressive Self-Guided Loss for Salient Object Detection [102.35488902433896]
We present a progressive self-guided loss function to facilitate deep learning-based salient object detection in images.
Our framework takes advantage of adaptively aggregated multi-scale features to locate and detect salient objects effectively.
arXiv Detail & Related papers (2021-01-07T07:33:38Z) - Global Context-Aware Progressive Aggregation Network for Salient Object
Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features.
We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.