Related papers: Regress Before Construct: Regress Autoencoder for Point Cloud Self-supervised Learning

Regress Before Construct: Regress Autoencoder for Point Cloud Self-supervised Learning

URL: http://arxiv.org/abs/2310.03670v1
Date: Mon, 25 Sep 2023 17:23:33 GMT
Title: Regress Before Construct: Regress Autoencoder for Point Cloud Self-supervised Learning
Authors: Yang Liu, Chen Chen, Can Wang, Xulin King, Mengyuan Liu
Abstract summary: Masked Autoencoders (MAE) have demonstrated promising performance in self-supervised learning for 2D and 3D computer vision. We propose Point Regress AutoEncoder (Point-RAE), a new scheme for regressive autoencoders for point cloud self-supervised learning. Our approach is efficient during pre-training and generalizes well on various downstream tasks.
Score: 18.10704604275133
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Masked Autoencoders (MAE) have demonstrated promising performance in self-supervised learning for both 2D and 3D computer vision. Nevertheless, existing MAE-based methods still have certain drawbacks. Firstly, the functional decoupling between the encoder and decoder is incomplete, which limits the encoder's representation learning ability. Secondly, downstream tasks solely utilize the encoder, failing to fully leverage the knowledge acquired through the encoder-decoder architecture in the pre-text task. In this paper, we propose Point Regress AutoEncoder (Point-RAE), a new scheme for regressive autoencoders for point cloud self-supervised learning. The proposed method decouples functions between the decoder and the encoder by introducing a mask regressor, which predicts the masked patch representation from the visible patch representation encoded by the encoder and the decoder reconstructs the target from the predicted masked patch representation. By doing so, we minimize the impact of decoder updates on the representation space of the encoder. Moreover, we introduce an alignment constraint to ensure that the representations for masked patches, predicted from the encoded representations of visible patches, are aligned with the masked patch presentations computed from the encoder. To make full use of the knowledge learned in the pre-training stage, we design a new finetune mode for the proposed Point-RAE. Extensive experiments demonstrate that our approach is efficient during pre-training and generalizes well on various downstream tasks. Specifically, our pre-trained models achieve a high accuracy of \textbf{90.28\%} on the ScanObjectNN hardest split and \textbf{94.1\%} accuracy on ModelNet40, surpassing all the other self-supervised learning methods. Our code and pretrained model are public available at: \url{https://github.com/liuyyy111/Point-RAE}.

Related papers

PCP-MAE: Learning to Predict Centers for Point Masked Autoencoders [57.31790812209751]
We show that when directly feeding the centers of masked patches to the decoder without information from the encoder, it still reconstructs well. We propose a simple yet effective method, i.e., learning to Predict Centers for Point Masked AutoEncoders (PCP-MAE) Our method is of high pre-training efficiency compared to other alternatives and achieves great improvement over Point-MAE.
arXiv Detail & Related papers (2024-08-16T13:53:53Z)
SeRP: Self-Supervised Representation Learning Using Perturbed Point Clouds [6.29475963948119]
SeRP consists of encoder-decoder architecture that takes perturbed or corrupted point clouds as inputs. We have used Transformers and PointNet-based Autoencoders.
arXiv Detail & Related papers (2022-09-13T15:22:36Z)
MAPLE: Masked Pseudo-Labeling autoEncoder for Semi-supervised Point Cloud Action Recognition [160.49403075559158]
We propose a Masked Pseudo-Labeling autoEncoder (textbfMAPLE) framework for point cloud action recognition. In particular, we design a novel and efficient textbfDecoupled textbfspatial-textbftemporal TranstextbfFormer (textbfDestFormer) as the backbone of MAPLE. MAPLE achieves superior results on three public benchmarks and outperforms the state-of-the-art method by 8.08% accuracy on the MSR-Action3
arXiv Detail & Related papers (2022-09-01T12:32:40Z)
SdAE: Self-distillated Masked Autoencoder [95.3684955370897]
Self-distillated masked AutoEncoder network SdAE is proposed in this paper. With only 300 epochs pre-training, a vanilla ViT-Base model achieves an 84.1% fine-tuning accuracy on ImageNet-1k classification.
arXiv Detail & Related papers (2022-07-31T15:07:25Z)
Improvements to Self-Supervised Representation Learning for Masked Image Modeling [0.0]
This paper explores improvements to the masked image modeling (MIM) paradigm. The MIM paradigm enables the model to learn the main object features of the image by masking the input image and predicting the masked part by the unmasked part. We propose a new model, Contrastive Masked AutoEncoders (CMAE)
arXiv Detail & Related papers (2022-05-21T09:45:50Z)
Self-Supervised Point Cloud Representation Learning with Occlusion Auto-Encoder [63.77257588569852]
We present 3D Occlusion Auto-Encoder (3D-OAE) for learning representations for point clouds. Our key idea is to randomly occlude some local patches of the input point cloud and establish the supervision via recovering the occluded patches. In contrast with previous methods, our 3D-OAE can remove a large proportion of patches and predict them only with a small number of visible patches.
arXiv Detail & Related papers (2022-03-26T14:06:29Z)
Context Autoencoder for Self-Supervised Representation Learning [64.63908944426224]
We pretrain an encoder by making predictions in the encoded representation space. The network is an encoder-regressor-decoder architecture. We demonstrate the effectiveness of our CAE through superior transfer performance in downstream tasks.
arXiv Detail & Related papers (2022-02-07T09:33:45Z)
Masked Autoencoders Are Scalable Vision Learners [60.97703494764904]
Masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. Coupling these two designs enables us to train large models efficiently and effectively.
arXiv Detail & Related papers (2021-11-11T18:46:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.