MAPLE: Masked Pseudo-Labeling autoEncoder for Semi-supervised Point
Cloud Action Recognition
- URL: http://arxiv.org/abs/2209.00407v1
- Date: Thu, 1 Sep 2022 12:32:40 GMT
- Title: MAPLE: Masked Pseudo-Labeling autoEncoder for Semi-supervised Point
Cloud Action Recognition
- Authors: Xiaodong Chen and Wu Liu and Xinchen Liu and Yongdong Zhang and
Jungong Han and Tao Mei
- Abstract summary: We propose a Masked Pseudo-Labeling autoEncoder (textbfMAPLE) framework for point cloud action recognition.
In particular, we design a novel and efficient textbfDecoupled textbfspatial-textbftemporal TranstextbfFormer (textbfDestFormer) as the backbone of MAPLE.
MAPLE achieves superior results on three public benchmarks and outperforms the state-of-the-art method by 8.08% accuracy on the MSR-Action3
- Score: 160.49403075559158
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recognizing human actions from point cloud videos has attracted tremendous
attention from both academia and industry due to its wide applications like
automatic driving, robotics, and so on. However, current methods for point
cloud action recognition usually require a huge amount of data with manual
annotations and a complex backbone network with high computation costs, which
makes it impractical for real-world applications. Therefore, this paper
considers the task of semi-supervised point cloud action recognition. We
propose a Masked Pseudo-Labeling autoEncoder (\textbf{MAPLE}) framework to
learn effective representations with much fewer annotations for point cloud
action recognition. In particular, we design a novel and efficient
\textbf{De}coupled \textbf{s}patial-\textbf{t}emporal Trans\textbf{Former}
(\textbf{DestFormer}) as the backbone of MAPLE. In DestFormer, the spatial and
temporal dimensions of the 4D point cloud videos are decoupled to achieve
efficient self-attention for learning both long-term and short-term features.
Moreover, to learn discriminative features from fewer annotations, we design a
masked pseudo-labeling autoencoder structure to guide the DestFormer to
reconstruct features of masked frames from the available frames. More
importantly, for unlabeled data, we exploit the pseudo-labels from the
classification head as the supervision signal for the reconstruction of
features from the masked frames. Finally, comprehensive experiments demonstrate
that MAPLE achieves superior results on three public benchmarks and outperforms
the state-of-the-art method by 8.08\% accuracy on the MSR-Action3D dataset.
Related papers
- Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones.
In 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant.
arXiv Detail & Related papers (2024-07-08T12:28:56Z) - Cross-Modal Information-Guided Network using Contrastive Learning for
Point Cloud Registration [17.420425069785946]
We present a novel Cross-Modal Information-Guided Network (CMIGNet) for point cloud registration.
We first incorporate the projected images from the point clouds and fuse the cross-modal features using the attention mechanism.
We employ two contrastive learning strategies, namely overlapping contrastive learning and cross-modal contrastive learning.
arXiv Detail & Related papers (2023-11-02T12:56:47Z) - Regress Before Construct: Regress Autoencoder for Point Cloud
Self-supervised Learning [18.10704604275133]
Masked Autoencoders (MAE) have demonstrated promising performance in self-supervised learning for 2D and 3D computer vision.
We propose Point Regress AutoEncoder (Point-RAE), a new scheme for regressive autoencoders for point cloud self-supervised learning.
Our approach is efficient during pre-training and generalizes well on various downstream tasks.
arXiv Detail & Related papers (2023-09-25T17:23:33Z) - MAELi: Masked Autoencoder for Large-Scale LiDAR Point Clouds [13.426810473131642]
Masked AutoEncoder for LiDAR point clouds (MAELi) intuitively leverages the sparsity of LiDAR point clouds in both the encoder and decoder during reconstruction.
In a novel reconstruction approach, MAELi distinguishes between empty and occluded space.
Thereby, without any ground truth whatsoever and trained on single frames only, MAELi obtains an understanding of the underlying 3D scene geometry and semantics.
arXiv Detail & Related papers (2022-12-14T13:10:27Z) - GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds [72.60362979456035]
Masked Autoencoders (MAE) are challenging to explore in large-scale 3D point clouds.
We propose a textbfGenerative textbfDecoder for MAE (GD-MAE) to automatically merges the surrounding context.
We demonstrate the efficacy of the proposed method on several large-scale benchmarks: KITTI, and ONCE.
arXiv Detail & Related papers (2022-12-06T14:32:55Z) - LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds [62.49198183539889]
We propose a label-efficient semantic segmentation pipeline for outdoor scenes with LiDAR point clouds.
Our method co-designs an efficient labeling process with semi/weakly supervised learning.
Our proposed method is even highly competitive compared to the fully supervised counterpart with 100% labels.
arXiv Detail & Related papers (2022-10-14T19:13:36Z) - Image Understands Point Cloud: Weakly Supervised 3D Semantic
Segmentation via Association Learning [59.64695628433855]
We propose a novel cross-modality weakly supervised method for 3D segmentation, incorporating complementary information from unlabeled images.
Basically, we design a dual-branch network equipped with an active labeling strategy, to maximize the power of tiny parts of labels.
Our method even outperforms the state-of-the-art fully supervised competitors with less than 1% actively selected annotations.
arXiv Detail & Related papers (2022-09-16T07:59:04Z) - Self-Supervised Point Cloud Representation Learning with Occlusion
Auto-Encoder [63.77257588569852]
We present 3D Occlusion Auto-Encoder (3D-OAE) for learning representations for point clouds.
Our key idea is to randomly occlude some local patches of the input point cloud and establish the supervision via recovering the occluded patches.
In contrast with previous methods, our 3D-OAE can remove a large proportion of patches and predict them only with a small number of visible patches.
arXiv Detail & Related papers (2022-03-26T14:06:29Z) - Upsampling Autoencoder for Self-Supervised Point Cloud Learning [11.19408173558718]
We propose a self-supervised pretraining model for point cloud learning without human annotations.
Upsampling operation encourages the network to capture both high-level semantic information and low-level geometric information of the point cloud.
We find that our UAE outperforms previous state-of-the-art methods in shape classification, part segmentation and point cloud upsampling tasks.
arXiv Detail & Related papers (2022-03-21T07:20:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.