Towards Compact 3D Representations via Point Feature Enhancement Masked
Autoencoders
- URL: http://arxiv.org/abs/2312.10726v1
- Date: Sun, 17 Dec 2023 14:17:05 GMT
- Title: Towards Compact 3D Representations via Point Feature Enhancement Masked
Autoencoders
- Authors: Yaohua Zha, Huizhen Ji, Jinmin Li, Rongsheng Li, Tao Dai, Bin Chen,
Zhi Wang, Shu-Tao Xia
- Abstract summary: We propose Point Feature Enhancement Masked Autoencoders (Point-FEMAE) to learn compact 3D representations.
Point-FEMAE consists of a global branch and a local branch to capture latent semantic features.
Our method significantly improves the pre-training efficiency compared to cross-modal alternatives.
- Score: 52.66195794216989
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning 3D representation plays a critical role in masked autoencoder (MAE)
based pre-training methods for point cloud, including single-modal and
cross-modal based MAE. Specifically, although cross-modal MAE methods learn
strong 3D representations via the auxiliary of other modal knowledge, they
often suffer from heavy computational burdens and heavily rely on massive
cross-modal data pairs that are often unavailable, which hinders their
applications in practice. Instead, single-modal methods with solely point
clouds as input are preferred in real applications due to their simplicity and
efficiency. However, such methods easily suffer from limited 3D representations
with global random mask input. To learn compact 3D representations, we propose
a simple yet effective Point Feature Enhancement Masked Autoencoders
(Point-FEMAE), which mainly consists of a global branch and a local branch to
capture latent semantic features. Specifically, to learn more compact features,
a share-parameter Transformer encoder is introduced to extract point features
from the global and local unmasked patches obtained by global random and local
block mask strategies, followed by a specific decoder to reconstruct.
Meanwhile, to further enhance features in the local branch, we propose a Local
Enhancement Module with local patch convolution to perceive fine-grained local
context at larger scales. Our method significantly improves the pre-training
efficiency compared to cross-modal alternatives, and extensive downstream
experiments underscore the state-of-the-art effectiveness, particularly
outperforming our baseline (Point-MAE) by 5.16%, 5.00%, and 5.04% in three
variants of ScanObjectNN, respectively. The code is available at
https://github.com/zyh16143998882/AAAI24-PointFEMAE.
Related papers
- Triple Point Masking [49.39218611030084]
Existing 3D mask learning methods encounter performance bottlenecks under limited data.
We introduce a triple point masking scheme, named TPM, which serves as a scalable framework for pre-training of masked autoencoders.
Extensive experiments show that the four baselines equipped with the proposed TPM achieve comprehensive performance improvements on various downstream tasks.
arXiv Detail & Related papers (2024-09-26T05:33:30Z) - PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - MaskLRF: Self-supervised Pretraining via Masked Autoencoding of Local Reference Frames for Rotation-invariant 3D Point Set Analysis [1.19658449368018]
This paper develops, for the first time, a rotation-invariant self-supervised pretraining framework for practical 3D point set analysis.
The proposed algorithm, called MaskLRF, learns rotation-invariant and highly generalizable latent features via masked autoencoding of 3D points.
I confirm that MaskLRF achieves new state-of-the-art accuracies in analyzing 3D point sets having inconsistent orientations.
arXiv Detail & Related papers (2024-03-01T00:42:49Z) - PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D
Object Detection [26.03582038710992]
Masked Autoencoders learn strong visual representations and achieve state-of-the-art results in several independent modalities.
In this work, we focus on point cloud and RGB image data, two modalities that are often presented together in the real world.
We propose PiMAE, a self-supervised pre-training framework that promotes 3D and 2D interaction through three aspects.
arXiv Detail & Related papers (2023-03-14T17:58:03Z) - GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds [72.60362979456035]
Masked Autoencoders (MAE) are challenging to explore in large-scale 3D point clouds.
We propose a textbfGenerative textbfDecoder for MAE (GD-MAE) to automatically merges the surrounding context.
We demonstrate the efficacy of the proposed method on several large-scale benchmarks: KITTI, and ONCE.
arXiv Detail & Related papers (2022-12-06T14:32:55Z) - Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud
Pre-training [56.81809311892475]
Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for language and 2D image transformers.
We propose Point-M2AE, a strong Multi-scale MAE pre-training framework for hierarchical self-supervised learning of 3D point clouds.
arXiv Detail & Related papers (2022-05-28T11:22:53Z) - DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF
Relocalization [56.15308829924527]
We propose a Siamese network that jointly learns 3D local feature detection and description directly from raw 3D points.
For detecting 3D keypoints we predict the discriminativeness of the local descriptors in an unsupervised manner.
Experiments on various benchmarks demonstrate that our method achieves competitive results for both global point cloud retrieval and local point cloud registration.
arXiv Detail & Related papers (2020-07-17T20:21:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.