Point Cloud Mixture-of-Domain-Experts Model for 3D Self-supervised Learning
- URL: http://arxiv.org/abs/2410.09886v3
- Date: Tue, 03 Jun 2025 01:21:46 GMT
- Title: Point Cloud Mixture-of-Domain-Experts Model for 3D Self-supervised Learning
- Authors: Yaohua Zha, Tao Dai, Hang Guo, Yanzi Wang, Bin Chen, Ke Chen, Shu-Tao Xia,
- Abstract summary: Point clouds, as a primary representation of 3D data, can be categorized into scene domain point clouds and object domain point clouds.<n>In this paper, we propose to learn a comprehensive Point cloud Mixture-of-Domain-Experts model (Point-MoDE) via a block-to-scene pre-training strategy.
- Score: 50.55005524072687
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Point clouds, as a primary representation of 3D data, can be categorized into scene domain point clouds and object domain point clouds. Point cloud self-supervised learning (SSL) has become a mainstream paradigm for learning 3D representations. However, existing point cloud SSL primarily focuses on learning domain-specific 3D representations within a single domain, neglecting the complementary nature of cross-domain knowledge, which limits the learning of 3D representations. In this paper, we propose to learn a comprehensive Point cloud Mixture-of-Domain-Experts model (Point-MoDE) via a block-to-scene pre-training strategy. Specifically, we first propose a mixture-of-domain-expert model consisting of scene domain experts and multiple shared object domain experts. Furthermore, we propose a block-to-scene pretraining strategy, which leverages the features of point blocks in the object domain to regress their initial positions in the scene domain through object-level block mask reconstruction and scene-level block position regression. By integrating the complementary knowledge between object and scene, this strategy simultaneously facilitates the learning of both object-domain and scene-domain representations, leading to a more comprehensive 3D representation. Extensive experiments in downstream tasks demonstrate the superiority of our model.
Related papers
- Topology-Aware Modeling for Unsupervised Simulation-to-Reality Point Cloud Recognition [63.55828203989405]
We introduce a novel Topology-Aware Modeling (TAM) framework for Sim2Real UDA on object point clouds.<n>Our approach mitigates the domain gap by leveraging global spatial topology, characterized by low-level, high-frequency 3D structures.<n>We propose an advanced self-training strategy that combines cross-domain contrastive learning with self-training.
arXiv Detail & Related papers (2025-06-26T11:53:59Z) - Point-MoE: Towards Cross-Domain Generalization in 3D Semantic Segmentation via Mixture-of-Experts [7.787211625411271]
We propose Point-MoE, a Mixture-of-Experts architecture designed to enable cross-domain generalization in 3D perception.<n>Standard point cloud backbones degrade significantly in performance when trained on mixed-domain data.<n>Point-MoE with a simple top-k routing strategy can automatically specialize experts, even without access to domain labels.
arXiv Detail & Related papers (2025-05-29T18:21:47Z) - 3D Focusing-and-Matching Network for Multi-Instance Point Cloud Registration [45.579241614565376]
We propose a powerful 3D focusing-and-matching network for multi-instance point cloud registration.
By using self-attention and cross-attention, we can locate potential matching instances by regressing object centers.
Our method achieves a new state-of-the-art performance on the multi-instance point cloud registration task.
arXiv Detail & Related papers (2024-11-12T12:04:44Z) - One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection [71.78795573911512]
We propose textbfOneDet3D, a universal one-for-all model that addresses 3D detection across different domains.
We propose the domain-aware in scatter and context, guided by a routing mechanism, to address the data interference issue.
The fully sparse structure and anchor-free head further accommodate point clouds with significant scale disparities.
arXiv Detail & Related papers (2024-11-03T14:21:56Z) - Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones.
In 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant.
arXiv Detail & Related papers (2024-07-08T12:28:56Z) - ZeroReg: Zero-Shot Point Cloud Registration with Foundation Models [77.84408427496025]
State-of-the-art 3D point cloud registration methods rely on labeled 3D datasets for training.<n>We introduce ZeroReg, a zero-shot registration approach that utilizes 2D foundation models to predict 3D correspondences.
arXiv Detail & Related papers (2023-12-05T11:33:16Z) - Compositional Semantic Mix for Domain Adaptation in Point Cloud
Segmentation [65.78246406460305]
compositional semantic mixing represents the first unsupervised domain adaptation technique for point cloud segmentation.
We present a two-branch symmetric network architecture capable of concurrently processing point clouds from a source domain (e.g. synthetic) and point clouds from a target domain (e.g. real-world)
arXiv Detail & Related papers (2023-08-28T14:43:36Z) - Point-Syn2Real: Semi-Supervised Synthetic-to-Real Cross-Domain Learning
for Object Classification in 3D Point Clouds [14.056949618464394]
Object classification using LiDAR 3D point cloud data is critical for modern applications such as autonomous driving.
We propose a semi-supervised cross-domain learning approach that does not rely on manual annotations of point clouds.
We introduce Point-Syn2Real, a new benchmark dataset for cross-domain learning on point clouds.
arXiv Detail & Related papers (2022-10-31T01:53:51Z) - OGC: Unsupervised 3D Object Segmentation from Rigid Dynamics of Point
Clouds [4.709764624933227]
We propose the first unsupervised method, called OGC, to simultaneously identify multiple 3D objects in a single forward pass.
We extensively evaluate our method on five datasets, demonstrating the superior performance for object part instance segmentation.
arXiv Detail & Related papers (2022-10-10T07:01:08Z) - MAPLE: Masked Pseudo-Labeling autoEncoder for Semi-supervised Point
Cloud Action Recognition [160.49403075559158]
We propose a Masked Pseudo-Labeling autoEncoder (textbfMAPLE) framework for point cloud action recognition.
In particular, we design a novel and efficient textbfDecoupled textbfspatial-textbftemporal TranstextbfFormer (textbfDestFormer) as the backbone of MAPLE.
MAPLE achieves superior results on three public benchmarks and outperforms the state-of-the-art method by 8.08% accuracy on the MSR-Action3
arXiv Detail & Related papers (2022-09-01T12:32:40Z) - Masked Autoencoders in 3D Point Cloud Representation Learning [7.617783375837524]
We propose masked Autoencoders in 3D point cloud representation learning (abbreviated as MAE3D)
We first split the input point cloud into patches and mask a portion of them, then use our Patch Embedding Module to extract the features of unmasked patches.
Comprehensive experiments demonstrate that the local features extracted by our MAE3D from point cloud patches are beneficial for downstream classification tasks.
arXiv Detail & Related papers (2022-07-04T16:13:27Z) - Masked Autoencoders for Self-Supervised Learning on Automotive Point
Clouds [2.8544513613730205]
Masked autoencoding has become a successful pre-training paradigm for Transformer models for text, images, and recently, point clouds.
We propose VoxelMAE, a simple masked autoencoding pretraining scheme designed for voxel representations.
Our method improves the 3D OD performance by 1.75 mAP points and 1.05 NDS on the challenging nuScenes dataset.
arXiv Detail & Related papers (2022-07-01T16:31:45Z) - SE(3)-Equivariant Attention Networks for Shape Reconstruction in
Function Space [50.14426188851305]
We propose the first SE(3)-equivariant coordinate-based network for learning occupancy fields from point clouds.
In contrast to previous shape reconstruction methods that align the input to a regular grid, we operate directly on the irregular, unoriented point cloud.
We show that our method outperforms previous SO(3)-equivariant methods, as well as non-equivariant methods trained on SO(3)-augmented datasets.
arXiv Detail & Related papers (2022-04-05T17:59:15Z) - Learning Local Displacements for Point Cloud Completion [93.54286830844134]
We propose a novel approach aimed at object and semantic scene completion from a partial scan represented as a 3D point cloud.
Our architecture relies on three novel layers that are used successively within an encoder-decoder structure.
We evaluate both architectures on object and indoor scene completion tasks, achieving state-of-the-art performance.
arXiv Detail & Related papers (2022-03-30T18:31:37Z) - Masked Discrimination for Self-Supervised Learning on Point Clouds [27.652157544218234]
Masked autoencoding has achieved great success for self-supervised learning in the image and language domains.
Standard backbones like PointNet are unable to properly handle the training versus testing distribution mismatch introduced by masking during training.
We bridge this gap by proposing a discriminative mask pretraining Transformer framework, MaskPoint, for point clouds.
arXiv Detail & Related papers (2022-03-21T17:57:34Z) - Self-Supervised Feature Learning from Partial Point Clouds via Pose
Disentanglement [35.404285596482175]
We propose a novel self-supervised framework to learn informative representations from partial point clouds.
We leverage partial point clouds scanned by LiDAR that contain both content and pose attributes.
Our method not only outperforms existing self-supervised methods, but also shows a better generalizability across synthetic and real-world datasets.
arXiv Detail & Related papers (2022-01-09T14:12:50Z) - Weakly Supervised Semantic Segmentation in 3D Graph-Structured Point
Clouds of Wild Scenes [36.07733308424772]
The deficiency of 3D segmentation labels is one of the main obstacles to effective point cloud segmentation.
We propose a novel deep graph convolutional network-based framework for large-scale semantic scene segmentation in point clouds with sole 2D supervision.
arXiv Detail & Related papers (2020-04-26T23:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.