Contrast with Reconstruct: Contrastive 3D Representation Learning Guided
by Generative Pretraining
- URL: http://arxiv.org/abs/2302.02318v2
- Date: Mon, 22 May 2023 12:40:49 GMT
- Title: Contrast with Reconstruct: Contrastive 3D Representation Learning Guided
by Generative Pretraining
- Authors: Zekun Qi, Runpei Dong, Guofan Fan, Zheng Ge, Xiangyu Zhang, Kaisheng
Ma, Li Yi
- Abstract summary: We propose Contrast with Reconstruct (ReCon) that unifies contrastive and generative modeling paradigms.
An encoder-decoder style ReCon-block is proposed that transfers knowledge through cross attention with stop-gradient.
ReCon achieves a new state-of-the-art in 3D representation learning, e.g., 91.26% accuracy on ScanObjectNN.
- Score: 26.908554018069545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mainstream 3D representation learning approaches are built upon contrastive
or generative modeling pretext tasks, where great improvements in performance
on various downstream tasks have been achieved. However, we find these two
paradigms have different characteristics: (i) contrastive models are
data-hungry that suffer from a representation over-fitting issue; (ii)
generative models have a data filling issue that shows inferior data scaling
capacity compared to contrastive models. This motivates us to learn 3D
representations by sharing the merits of both paradigms, which is non-trivial
due to the pattern difference between the two paradigms. In this paper, we
propose Contrast with Reconstruct (ReCon) that unifies these two paradigms.
ReCon is trained to learn from both generative modeling teachers and
single/cross-modal contrastive teachers through ensemble distillation, where
the generative student guides the contrastive student. An encoder-decoder style
ReCon-block is proposed that transfers knowledge through cross attention with
stop-gradient, which avoids pretraining over-fitting and pattern difference
issues. ReCon achieves a new state-of-the-art in 3D representation learning,
e.g., 91.26% accuracy on ScanObjectNN. Codes have been released at
https://github.com/qizekun/ReCon.
Related papers
- GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction [52.04103235260539]
We present a diffusion model approach based on Gaussian Splatting representation for 3D object reconstruction from a single view.
The model learns to generate 3D objects represented by sets of GS ellipsoids.
The final reconstructed objects explicitly come with high-quality 3D structure and texture, and can be efficiently rendered in arbitrary views.
arXiv Detail & Related papers (2024-07-05T03:43:08Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Noisy-Correspondence Learning for Text-to-Image Person Re-identification [50.07634676709067]
We propose a novel Robust Dual Embedding method (RDE) to learn robust visual-semantic associations even with noisy correspondences.
Our method achieves state-of-the-art results both with and without synthetic noisy correspondences on three datasets.
arXiv Detail & Related papers (2023-08-19T05:34:13Z) - Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud
Recognition [108.07591240357306]
We tackle the data scarcity challenge in few-shot point cloud recognition of 3D objects by using a joint prediction from a conventional 3D model and a well-trained 2D model.
We find out the crux is the less effective training for the ''joint hard samples'', which have high confidence prediction on different wrong labels.
Our proposed invariant training strategy, called InvJoint, does not only emphasize the training more on the hard samples, but also seeks the invariance between the conflicting 2D and 3D ambiguous predictions.
arXiv Detail & Related papers (2023-08-18T17:43:12Z) - Prompted Contrast with Masked Motion Modeling: Towards Versatile 3D
Action Representation Learning [33.68311764817763]
We propose Prompted Contrast with Masked Motion Modeling, PCM$rm 3$, for versatile 3D action representation learning.
Our method integrates the contrastive learning and masked prediction tasks in a mutually beneficial manner.
Tests on five downstream tasks under three large-scale datasets are conducted, demonstrating the superior generalization capacity of PCM$rm 3$ compared to the state-of-the-art works.
arXiv Detail & Related papers (2023-08-08T01:27:55Z) - Hallucination Improves the Performance of Unsupervised Visual
Representation Learning [9.504503675097137]
We propose Hallucinator that could efficiently generate additional positive samples for further contrast.
The Hallucinator is differentiable and creates new data in the feature space.
Remarkably, we empirically prove that the proposed Hallucinator generalizes well to various contrastive learning models.
arXiv Detail & Related papers (2023-07-22T21:15:56Z) - Progressive Learning of 3D Reconstruction Network from 2D GAN Data [33.42114674602613]
This paper presents a method to reconstruct high-quality textured 3D models from single images.
Our method relies on datasets with expensive annotations; multi-view images and their camera parameters.
We show significant improvements over previous methods whether they were trained on GAN generated multi-view images or on real images with expensive annotations.
arXiv Detail & Related papers (2023-05-18T16:45:51Z) - Masked Scene Contrast: A Scalable Framework for Unsupervised 3D
Representation Learning [37.155772047656114]
Masked Scene Contrast (MSC) framework is capable of extracting comprehensive 3D representations more efficiently and effectively.
MSC also enables large-scale 3D pre-training across multiple datasets.
arXiv Detail & Related papers (2023-03-24T17:59:58Z) - Generative-Contrastive Learning for Self-Supervised Latent
Representations of 3D Shapes from Multi-Modal Euclidean Input [44.10761155817833]
We propose a combined generative and contrastive neural architecture for learning latent representations of 3D shapes.
The architecture uses two encoder branches for voxel grids and multi-view images from the same underlying shape.
arXiv Detail & Related papers (2023-01-11T18:14:24Z) - PointACL:Adversarial Contrastive Learning for Robust Point Clouds
Representation under Adversarial Attack [73.3371797787823]
Adversarial contrastive learning (ACL) is considered an effective way to improve the robustness of pre-trained models.
We present our robust aware loss function to train self-supervised contrastive learning framework adversarially.
We validate our method, PointACL on downstream tasks, including 3D classification and 3D segmentation with multiple datasets.
arXiv Detail & Related papers (2022-09-14T22:58:31Z) - Recurrent Multi-view Alignment Network for Unsupervised Surface
Registration [79.72086524370819]
Learning non-rigid registration in an end-to-end manner is challenging due to the inherent high degrees of freedom and the lack of labeled training data.
We propose to represent the non-rigid transformation with a point-wise combination of several rigid transformations.
We also introduce a differentiable loss function that measures the 3D shape similarity on the projected multi-view 2D depth images.
arXiv Detail & Related papers (2020-11-24T14:22:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.