Related papers: Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views

Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views

URL: http://arxiv.org/abs/2509.01250v1
Date: Mon, 01 Sep 2025 08:42:17 GMT
Title: Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views
Authors: Xiangdong Zhang, Shaofeng Zhang, Junchi Yan,
Abstract summary: We propose Point-PQAE, a cross-reconstruction generative paradigm that first generates two decoupled point clouds/views and then reconstructs one from the other.<n>The cross-reconstruction significantly increases the difficulty of pre-training compared to self-reconstruction, which enables our method to surpass previous single-modal self-reconstruction methods in 3D self-supervised learning.
Score: 58.45662356692379
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Point cloud learning, especially in a self-supervised way without manual labels, has gained growing attention in both vision and learning communities due to its potential utility in a wide range of applications. Most existing generative approaches for point cloud self-supervised learning focus on recovering masked points from visible ones within a single view. Recognizing that a two-view pre-training paradigm inherently introduces greater diversity and variance, it may thus enable more challenging and informative pre-training. Inspired by this, we explore the potential of two-view learning in this domain. In this paper, we propose Point-PQAE, a cross-reconstruction generative paradigm that first generates two decoupled point clouds/views and then reconstructs one from the other. To achieve this goal, we develop a crop mechanism for point cloud view generation for the first time and further propose a novel positional encoding to represent the 3D relative position between the two decoupled views. The cross-reconstruction significantly increases the difficulty of pre-training compared to self-reconstruction, which enables our method to surpass previous single-modal self-reconstruction methods in 3D self-supervised learning. Specifically, it outperforms the self-reconstruction baseline (Point-MAE) by 6.5%, 7.0%, and 6.7% in three variants of ScanObjectNN with the Mlp-Linear evaluation protocol. The code is available at https://github.com/aHapBean/Point-PQAE.

Related papers

Masked Clustering Prediction for Unsupervised Point Cloud Pre-training [61.11226004056774]
MaskClu is a novel unsupervised pre-training method for ViTs on 3D point clouds.<n>It integrates masked point modeling with clustering-based learning.
arXiv Detail & Related papers (2025-08-12T12:58:44Z)
HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation [106.09886920774002]
We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network. Our method achieves consistent improvements over the baseline trained from scratch and significantly out- performs the existing schemes.
arXiv Detail & Related papers (2024-03-18T14:18:08Z)
Cross-Modal Information-Guided Network using Contrastive Learning for Point Cloud Registration [17.420425069785946]
We present a novel Cross-Modal Information-Guided Network (CMIGNet) for point cloud registration. We first incorporate the projected images from the point clouds and fuse the cross-modal features using the attention mechanism. We employ two contrastive learning strategies, namely overlapping contrastive learning and cross-modal contrastive learning.
arXiv Detail & Related papers (2023-11-02T12:56:47Z)
Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models [97.58685709663287]
generative pre-training can boost the performance of fundamental models in 2D vision. In 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of generative pre-training. We propose a novel 3D-to-2D generative pre-training method that is adaptable to any point cloud model.
arXiv Detail & Related papers (2023-07-27T16:07:03Z)
GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training [16.825524577372473]
We introduce a point cloud representation learning framework, based on geometric feature reconstruction. We identify three self-supervised learning objectives to peculiar point clouds, namely centroid prediction, normal estimation, and curvature prediction. Our pipeline is conceptually simple and it consists of two major steps: first, it randomly masks out groups of points, followed by a Transformer-based point cloud encoder.
arXiv Detail & Related papers (2023-05-15T17:14:55Z)
PointVST: Self-Supervised Pre-training for 3D Point Clouds via View-Specific Point-to-Image Translation [64.858505571083]
This paper proposes a translative pre-training framework, namely PointVST. It is driven by a novel self-supervised pretext task of cross-modal translation from 3D point clouds to their corresponding diverse forms of 2D rendered images.
arXiv Detail & Related papers (2022-12-29T07:03:29Z)
PC-RGNN: Point Cloud Completion and Graph Neural Network for 3D Object Detection [57.49788100647103]
LiDAR-based 3D object detection is an important task for autonomous driving. Current approaches suffer from sparse and partial point clouds of distant and occluded objects. In this paper, we propose a novel two-stage approach, namely PC-RGNN, dealing with such challenges by two specific solutions.
arXiv Detail & Related papers (2020-12-18T18:06:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.