Multiple Invertible and Partial-Equivariant Function for Latent Vector Transformation to Enhance Disentanglement in VAEs
- URL: http://arxiv.org/abs/2502.03740v1
- Date: Thu, 06 Feb 2025 03:08:12 GMT
- Title: Multiple Invertible and Partial-Equivariant Function for Latent Vector Transformation to Enhance Disentanglement in VAEs
- Authors: Hee-Jun Jung, Jaehyoung Jeong, Kangil Kim,
- Abstract summary: We propose a novel method, called Multiple Invertible and partial-equivariant transformation (MIPE-transformation), to inject inductive bias.
In experiments on 3D Cars, 3D Shapes, and dSprites datasets, MIPE-transformation improves the disentanglement performance of state-of-the-art VAEs.
- Score: 2.048226951354646
- License:
- Abstract: Disentanglement learning is a core issue for understanding and re-using trained information in Variational AutoEncoder (VAE), and effective inductive bias has been reported as a key factor. However, the actual implementation of such bias is still vague. In this paper, we propose a novel method, called Multiple Invertible and partial-equivariant transformation (MIPE-transformation), to inject inductive bias by 1) guaranteeing the invertibility of latent-to-latent vector transformation while preserving a certain portion of equivariance of input-to-latent vector transformation, called Invertible and partial-equivariant transformation (IPE-transformation), 2) extending the form of prior and posterior in VAE frameworks to an unrestricted form through a learnable conversion to an approximated exponential family, called Exponential Family conversion (EF-conversion), and 3) integrating multiple units of IPE-transformation and EF-conversion, and their training. In experiments on 3D Cars, 3D Shapes, and dSprites datasets, MIPE-transformation improves the disentanglement performance of state-of-the-art VAEs.
Related papers
- Self-supervised Transformation Learning for Equivariant Representations [26.207358743969277]
Unsupervised representation learning has significantly advanced various machine learning tasks.
We propose Self-supervised Transformation Learning (STL), replacing transformation labels with transformation representations derived from image pairs.
We demonstrate the approach's effectiveness across diverse classification and detection tasks, outperforming existing methods in 7 out of 11 benchmarks.
arXiv Detail & Related papers (2025-01-15T10:54:21Z) - Unsupervised Representation Learning from Sparse Transformation Analysis [79.94858534887801]
We propose to learn representations from sequence data by factorizing the transformations of the latent variables into sparse components.
Input data are first encoded as distributions of latent activations and subsequently transformed using a probability flow model.
arXiv Detail & Related papers (2024-10-07T23:53:25Z) - Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection [37.142470149311904]
We propose atemporal equivariant learning framework by considering both spatial and temporal augmentations jointly.
We show our pre-training method for 3D object detection which outperforms existing equivariant and invariant approaches in many settings.
arXiv Detail & Related papers (2024-04-17T20:41:49Z) - Recurrence Boosts Diversity! Revisiting Recurrent Latent Variable in
Transformer-Based Variational AutoEncoder for Diverse Text Generation [85.5379146125199]
Variational Auto-Encoder (VAE) has been widely adopted in text generation.
We propose TRACE, a Transformer-based recurrent VAE structure.
arXiv Detail & Related papers (2022-10-22T10:25:35Z) - Learning Symmetric Embeddings for Equivariant World Models [9.781637768189158]
We propose learning symmetric embedding networks (SENs) that encode an input space (e.g. images)
This network can be trained end-to-end with an equivariant task network to learn an explicitly symmetric representation.
Our experiments demonstrate that SENs facilitate the application of equivariant networks to data with complex symmetry representations.
arXiv Detail & Related papers (2022-04-24T22:31:52Z) - Improving the Sample-Complexity of Deep Classification Networks with
Invariant Integration [77.99182201815763]
Leveraging prior knowledge on intraclass variance due to transformations is a powerful method to improve the sample complexity of deep neural networks.
We propose a novel monomial selection algorithm based on pruning methods to allow an application to more complex problems.
We demonstrate the improved sample complexity on the Rotated-MNIST, SVHN and CIFAR-10 datasets.
arXiv Detail & Related papers (2022-02-08T16:16:11Z) - Deformation Robust Roto-Scale-Translation Equivariant CNNs [10.44236628142169]
Group-equivariant convolutional neural networks (G-CNNs) achieve significantly improved generalization performance with intrinsic symmetry.
General theory and practical implementation of G-CNNs have been studied for planar images under either rotation or scaling transformation.
arXiv Detail & Related papers (2021-11-22T03:58:24Z) - Equivariant Deep Dynamical Model for Motion Prediction [0.0]
Deep generative modeling is a powerful approach for dynamical modeling to discover the most simplified and compressed underlying description of the data.
Most learning tasks have intrinsic symmetries, i.e., the input transformations leave the output unchanged, or the output undergoes a similar transformation.
We propose an SO(3) equivariant deep dynamical model (EqDDM) for motion prediction that learns a structured representation of the input space in the sense that the embedding varies with symmetry transformations.
arXiv Detail & Related papers (2021-11-02T21:01:43Z) - Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
State-of-the-art neural language models (LMs) represented by Transformers are highly complex.
This paper proposes a full Bayesian learning framework for Transformer LM estimation.
arXiv Detail & Related papers (2021-02-09T10:55:27Z) - SurVAE Flows: Surjections to Bridge the Gap between VAEs and Flows [78.77808270452974]
SurVAE Flows is a modular framework for composable transformations that encompasses VAEs and normalizing flows.
We show that several recently proposed methods, including dequantization and augmented normalizing flows, can be expressed as SurVAE Flows.
arXiv Detail & Related papers (2020-07-06T13:13:22Z) - Variational Transformers for Diverse Response Generation [71.53159402053392]
Variational Transformer (VT) is a variational self-attentive feed-forward sequence model.
VT combines the parallelizability and global receptive field computation of the Transformer with the variational nature of the CVAE.
We explore two types of VT: 1) modeling the discourse-level diversity with a global latent variable; and 2) augmenting the Transformer decoder with a sequence of finegrained latent variables.
arXiv Detail & Related papers (2020-03-28T07:48:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.