Self-Supervised Point Cloud Representation Learning with Occlusion
Auto-Encoder
- URL: http://arxiv.org/abs/2203.14084v1
- Date: Sat, 26 Mar 2022 14:06:29 GMT
- Title: Self-Supervised Point Cloud Representation Learning with Occlusion
Auto-Encoder
- Authors: Junsheng Zhou, Xin Wen, Yu-Shen Liu, Yi Fang, Zhizhong Han
- Abstract summary: We present 3D Occlusion Auto-Encoder (3D-OAE) for learning representations for point clouds.
Our key idea is to randomly occlude some local patches of the input point cloud and establish the supervision via recovering the occluded patches.
In contrast with previous methods, our 3D-OAE can remove a large proportion of patches and predict them only with a small number of visible patches.
- Score: 63.77257588569852
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning representations for point clouds is an important task in 3D computer
vision, especially without manually annotated supervision. Previous methods
usually take the common aid from auto-encoders to establish the
self-supervision by reconstructing the input itself. However, the existing
self-reconstruction based auto-encoders merely focus on the global shapes, and
ignore the hierarchical context between the local and global geometries, which
is a crucial supervision for 3D representation learning. To resolve this issue,
we present a novel self-supervised point cloud representation learning
framework, named 3D Occlusion Auto-Encoder (3D-OAE). Our key idea is to
randomly occlude some local patches of the input point cloud and establish the
supervision via recovering the occluded patches using the remaining visible
ones. Specifically, we design an encoder for learning the features of visible
local patches, and a decoder for leveraging these features to predict the
occluded patches. In contrast with previous methods, our 3D-OAE can remove a
large proportion of patches and predict them only with a small number of
visible patches, which enable us to significantly accelerate training and yield
a nontrivial self-supervisory performance. The trained encoder can be further
transferred to various downstream tasks. We demonstrate our superior
performances over the state-of-the-art methods in different discriminant and
generative applications under widely used benchmarks.
Related papers
- Towards Compact 3D Representations via Point Feature Enhancement Masked
Autoencoders [52.66195794216989]
We propose Point Feature Enhancement Masked Autoencoders (Point-FEMAE) to learn compact 3D representations.
Point-FEMAE consists of a global branch and a local branch to capture latent semantic features.
Our method significantly improves the pre-training efficiency compared to cross-modal alternatives.
arXiv Detail & Related papers (2023-12-17T14:17:05Z) - Regress Before Construct: Regress Autoencoder for Point Cloud
Self-supervised Learning [18.10704604275133]
Masked Autoencoders (MAE) have demonstrated promising performance in self-supervised learning for 2D and 3D computer vision.
We propose Point Regress AutoEncoder (Point-RAE), a new scheme for regressive autoencoders for point cloud self-supervised learning.
Our approach is efficient during pre-training and generalizes well on various downstream tasks.
arXiv Detail & Related papers (2023-09-25T17:23:33Z) - 3D Feature Prediction for Masked-AutoEncoder-Based Point Cloud Pretraining [45.58631796379208]
Masked autoencoders (MAEs) have recently been introduced to 3D self-supervised pretraining for point clouds.
We propose to ignore point position reconstruction and recover high-order features at masked points through a novel attention-based decoder.
We validate our pretext task and decoder design using different encoder structures for 3D training and demonstrate the advantages of our pretrained networks on various point cloud analysis tasks.
arXiv Detail & Related papers (2023-04-14T03:25:24Z) - MAELi: Masked Autoencoder for Large-Scale LiDAR Point Clouds [13.426810473131642]
Masked AutoEncoder for LiDAR point clouds (MAELi) intuitively leverages the sparsity of LiDAR point clouds in both the encoder and decoder during reconstruction.
In a novel reconstruction approach, MAELi distinguishes between empty and occluded space.
Thereby, without any ground truth whatsoever and trained on single frames only, MAELi obtains an understanding of the underlying 3D scene geometry and semantics.
arXiv Detail & Related papers (2022-12-14T13:10:27Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - MAPLE: Masked Pseudo-Labeling autoEncoder for Semi-supervised Point
Cloud Action Recognition [160.49403075559158]
We propose a Masked Pseudo-Labeling autoEncoder (textbfMAPLE) framework for point cloud action recognition.
In particular, we design a novel and efficient textbfDecoupled textbfspatial-textbftemporal TranstextbfFormer (textbfDestFormer) as the backbone of MAPLE.
MAPLE achieves superior results on three public benchmarks and outperforms the state-of-the-art method by 8.08% accuracy on the MSR-Action3
arXiv Detail & Related papers (2022-09-01T12:32:40Z) - Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud
Pre-training [56.81809311892475]
Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for language and 2D image transformers.
We propose Point-M2AE, a strong Multi-scale MAE pre-training framework for hierarchical self-supervised learning of 3D point clouds.
arXiv Detail & Related papers (2022-05-28T11:22:53Z) - Implicit Autoencoder for Point-Cloud Self-Supervised Representation
Learning [39.521374237630766]
The most popular and accessible 3D representation, i.e., point clouds, involves discrete samples of the underlying continuous 3D surface.
This discretization process introduces sampling variations on the 3D shape, making it challenging to develop transferable knowledge of the true 3D geometry.
In the standard autoencoding paradigm, the encoder is compelled to encode not only the 3D geometry but also information on the specific discrete sampling of the 3D shape into the latent code.
This is because the point cloud reconstructed by the decoder is considered unacceptable unless there is a perfect mapping between the original and the reconstructed
arXiv Detail & Related papers (2022-01-03T18:05:52Z) - A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D
Skeleton Based Person Re-Identification [65.18004601366066]
Person re-identification (Re-ID) via gait features within 3D skeleton sequences is a newly-emerging topic with several advantages.
This paper proposes a self-supervised gait encoding approach that can leverage unlabeled skeleton data to learn gait representations for person Re-ID.
arXiv Detail & Related papers (2020-09-05T16:06:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.