Generative-Contrastive Learning for Self-Supervised Latent
Representations of 3D Shapes from Multi-Modal Euclidean Input
- URL: http://arxiv.org/abs/2301.04612v1
- Date: Wed, 11 Jan 2023 18:14:24 GMT
- Title: Generative-Contrastive Learning for Self-Supervised Latent
Representations of 3D Shapes from Multi-Modal Euclidean Input
- Authors: Chengzhi Wu, Julius Pfrommer, Mingyuan Zhou and J\"urgen Beyerer
- Abstract summary: We propose a combined generative and contrastive neural architecture for learning latent representations of 3D shapes.
The architecture uses two encoder branches for voxel grids and multi-view images from the same underlying shape.
- Score: 44.10761155817833
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a combined generative and contrastive neural architecture for
learning latent representations of 3D volumetric shapes. The architecture uses
two encoder branches for voxel grids and multi-view images from the same
underlying shape. The main idea is to combine a contrastive loss between the
resulting latent representations with an additional reconstruction loss. That
helps to avoid collapsing the latent representations as a trivial solution for
minimizing the contrastive loss. A novel switching scheme is used to
cross-train two encoders with a shared decoder. The switching scheme also
enables the stop gradient operation on a random branch. Further classification
experiments show that the latent representations learned with our
self-supervised method integrate more useful information from the additional
input data implicitly, thus leading to better reconstruction and classification
performance.
Related papers
- Enhancing 3D Transformer Segmentation Model for Medical Image with Token-level Representation Learning [9.896550384001348]
This work proposes a token-level representation learning loss that maximizes agreement between token embeddings from different augmented views individually.
We also invent a simple "rotate-and-restore" mechanism, which rotates and flips one augmented view of input volume, and later restores the order of tokens in the feature maps.
We test our pre-training scheme on two public medical segmentation datasets, and the results on the downstream segmentation task show more improvement of our methods than other state-of-the-art pre-trainig methods.
arXiv Detail & Related papers (2024-08-12T01:49:13Z) - Triple-View Knowledge Distillation for Semi-Supervised Semantic
Segmentation [54.23510028456082]
We propose a Triple-view Knowledge Distillation framework, termed TriKD, for semi-supervised semantic segmentation.
The framework includes the triple-view encoder and the dual-frequency decoder.
arXiv Detail & Related papers (2023-09-22T01:02:21Z) - Efficient View Synthesis and 3D-based Multi-Frame Denoising with
Multiplane Feature Representations [1.18885605647513]
We introduce the first 3D-based multi-frame denoising method that significantly outperforms its 2D-based counterparts with lower computational requirements.
Our method extends the multiplane image (MPI) framework for novel view synthesis by introducing a learnable encoder-renderer pair manipulating multiplane in feature space.
arXiv Detail & Related papers (2023-03-31T15:23:35Z) - Contrast with Reconstruct: Contrastive 3D Representation Learning Guided
by Generative Pretraining [26.908554018069545]
We propose Contrast with Reconstruct (ReCon) that unifies contrastive and generative modeling paradigms.
An encoder-decoder style ReCon-block is proposed that transfers knowledge through cross attention with stop-gradient.
ReCon achieves a new state-of-the-art in 3D representation learning, e.g., 91.26% accuracy on ScanObjectNN.
arXiv Detail & Related papers (2023-02-05T06:58:35Z) - Neural Contourlet Network for Monocular 360 Depth Estimation [37.82642960470551]
We provide a new perspective that constructs an interpretable and sparse representation for a 360 image.
We propose a neural contourlet network consisting of a convolutional neural network and a contourlet transform branch.
In the encoder stage, we design a spatial-spectral fusion module to effectively fuse two types of cues.
arXiv Detail & Related papers (2022-08-03T02:25:55Z) - The Transitive Information Theory and its Application to Deep Generative
Models [0.0]
Variational Autoencoder (VAE) could be pushed in two opposite directions.
Existing methods narrow the issues to the rate-distortion trade-off between compression and reconstruction.
We develop a system that learns a hierarchy of disentangled representation together with a mechanism for recombining the learned representation for generalization.
arXiv Detail & Related papers (2022-03-09T22:35:02Z) - Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks.
We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation.
We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z) - Recurrent Multi-view Alignment Network for Unsupervised Surface
Registration [79.72086524370819]
Learning non-rigid registration in an end-to-end manner is challenging due to the inherent high degrees of freedom and the lack of labeled training data.
We propose to represent the non-rigid transformation with a point-wise combination of several rigid transformations.
We also introduce a differentiable loss function that measures the 3D shape similarity on the projected multi-view 2D depth images.
arXiv Detail & Related papers (2020-11-24T14:22:42Z) - Identity Enhanced Residual Image Denoising [61.75610647978973]
We learn a fully-convolutional network model that consists of a Chain of Identity Mapping Modules and residual on the residual architecture for image denoising.
The proposed network produces remarkably higher numerical accuracy and better visual image quality than the classical state-of-the-art and CNN algorithms.
arXiv Detail & Related papers (2020-04-26T04:52:22Z) - Convolutional Occupancy Networks [88.48287716452002]
We propose Convolutional Occupancy Networks, a more flexible implicit representation for detailed reconstruction of objects and 3D scenes.
By combining convolutional encoders with implicit occupancy decoders, our model incorporates inductive biases, enabling structured reasoning in 3D space.
We empirically find that our method enables the fine-grained implicit 3D reconstruction of single objects, scales to large indoor scenes, and generalizes well from synthetic to real data.
arXiv Detail & Related papers (2020-03-10T10:17:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.