GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds
- URL: http://arxiv.org/abs/2212.03010v2
- Date: Wed, 7 Dec 2022 13:18:55 GMT
- Title: GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds
- Authors: Honghui Yang and Tong He and Jiaheng Liu and Hua Chen and Boxi Wu and
Binbin Lin and Xiaofei He and Wanli Ouyang
- Abstract summary: Masked Autoencoders (MAE) are challenging to explore in large-scale 3D point clouds.
We propose a textbfGenerative textbfDecoder for MAE (GD-MAE) to automatically merges the surrounding context.
We demonstrate the efficacy of the proposed method on several large-scale benchmarks: KITTI, and ONCE.
- Score: 72.60362979456035
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the tremendous progress of Masked Autoencoders (MAE) in developing
vision tasks such as image and video, exploring MAE in large-scale 3D point
clouds remains challenging due to the inherent irregularity. In contrast to
previous 3D MAE frameworks, which either design a complex decoder to infer
masked information from maintained regions or adopt sophisticated masking
strategies, we instead propose a much simpler paradigm. The core idea is to
apply a \textbf{G}enerative \textbf{D}ecoder for MAE (GD-MAE) to automatically
merges the surrounding context to restore the masked geometric knowledge in a
hierarchical fusion manner. In doing so, our approach is free from introducing
the heuristic design of decoders and enjoys the flexibility of exploring
various masking strategies. The corresponding part costs less than
\textbf{12\%} latency compared with conventional methods, while achieving
better performance. We demonstrate the efficacy of the proposed method on
several large-scale benchmarks: Waymo, KITTI, and ONCE. Consistent improvement
on downstream detection tasks illustrates strong robustness and generalization
capability. Not only our method reveals state-of-the-art results, but
remarkably, we achieve comparable accuracy even with \textbf{20\%} of the
labeled data on the Waymo dataset. The code will be released at
\url{https://github.com/Nightmare-n/GD-MAE}.
Related papers
- MCGS: Multiview Consistency Enhancement for Sparse-View 3D Gaussian Radiance Fields [73.49548565633123]
Radiance fields represented by 3D Gaussians excel at synthesizing novel views, offering both high training efficiency and fast rendering.
Existing methods often incorporate depth priors from dense estimation networks but overlook the inherent multi-view consistency in input images.
We propose a view framework based on 3D Gaussian Splatting, named MCGS, enabling scene reconstruction from sparse input views.
arXiv Detail & Related papers (2024-10-15T08:39:05Z) - Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones.
In 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant.
arXiv Detail & Related papers (2024-07-08T12:28:56Z) - GeoMask3D: Geometrically Informed Mask Selection for Self-Supervised Point Cloud Learning in 3D [18.33878596057853]
We introduce a pioneering approach to self-supervised learning for point clouds.
We employ a geometrically informed mask selection strategy called GeoMask3D (GM3D) to boost the efficiency of Masked Autos.
arXiv Detail & Related papers (2024-05-20T23:53:42Z) - UGMAE: A Unified Framework for Graph Masked Autoencoders [67.75493040186859]
We propose UGMAE, a unified framework for graph masked autoencoders.
We first develop an adaptive feature mask generator to account for the unique significance of nodes.
We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information.
arXiv Detail & Related papers (2024-02-12T19:39:26Z) - Towards Compact 3D Representations via Point Feature Enhancement Masked
Autoencoders [52.66195794216989]
We propose Point Feature Enhancement Masked Autoencoders (Point-FEMAE) to learn compact 3D representations.
Point-FEMAE consists of a global branch and a local branch to capture latent semantic features.
Our method significantly improves the pre-training efficiency compared to cross-modal alternatives.
arXiv Detail & Related papers (2023-12-17T14:17:05Z) - How Mask Matters: Towards Theoretical Understandings of Masked
Autoencoders [21.849681446573257]
Masked Autoencoders (MAE) based on a reconstruction task have risen to be a promising paradigm for self-supervised learning (SSL)
We propose a theoretical understanding of how masking matters for MAE to learn meaningful features.
arXiv Detail & Related papers (2022-10-15T17:36:03Z) - MAPLE: Masked Pseudo-Labeling autoEncoder for Semi-supervised Point
Cloud Action Recognition [160.49403075559158]
We propose a Masked Pseudo-Labeling autoEncoder (textbfMAPLE) framework for point cloud action recognition.
In particular, we design a novel and efficient textbfDecoupled textbfspatial-textbftemporal TranstextbfFormer (textbfDestFormer) as the backbone of MAPLE.
MAPLE achieves superior results on three public benchmarks and outperforms the state-of-the-art method by 8.08% accuracy on the MSR-Action3
arXiv Detail & Related papers (2022-09-01T12:32:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.