Disentangling Patterns and Transformations from One Sequence of Images
with Shape-invariant Lie Group Transformer
- URL: http://arxiv.org/abs/2203.11210v1
- Date: Mon, 21 Mar 2022 11:55:13 GMT
- Title: Disentangling Patterns and Transformations from One Sequence of Images
with Shape-invariant Lie Group Transformer
- Authors: T. Takada, W. Shimaya, Y. Ohmura, Y. Kuniyoshi
- Abstract summary: We take a novel approach for representation learning based on a simpler and more intuitive formulation that the observed world is the combination of multiple independent patterns and transformations.
We propose a model that disentangles the scenes into the minimum number of basic components of patterns and Lie transformations from only one sequence of images.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An effective way to model the complex real world is to view the world as a
composition of basic components of objects and transformations. Although humans
through development understand the compositionality of the real world, it is
extremely difficult to equip robots with such a learning mechanism. In recent
years, there has been significant research on autonomously learning
representations of the world using the deep learning; however, most studies
have taken a statistical approach, which requires a large number of training
data. Contrary to such existing methods, we take a novel algebraic approach for
representation learning based on a simpler and more intuitive formulation that
the observed world is the combination of multiple independent patterns and
transformations that are invariant to the shape of patterns. Since the shape of
patterns can be viewed as the invariant features against symmetric
transformations such as translation or rotation, we can expect that the
patterns can naturally be extracted by expressing transformations with
symmetric Lie group transformers and attempting to reconstruct the scene with
them. Based on this idea, we propose a model that disentangles the scenes into
the minimum number of basic components of patterns and Lie transformations from
only one sequence of images, by introducing the learnable shape-invariant Lie
group transformers as transformation components. Experiments show that given
one sequence of images in which two objects are moving independently, the
proposed model can discover the hidden distinct objects and multiple
shape-invariant transformations that constitute the scenes.
Related papers
- DeFormer: Integrating Transformers with Deformable Models for 3D Shape
Abstraction from a Single Image [31.154786931081087]
We propose a novel bi-channel Transformer architecture, integrated with parameterized deformable models, to simultaneously estimate the global and local deformations of primitives.
DeFormer achieves better reconstruction accuracy over the state-of-the-art, and visualizes with consistent semantic correspondences for improved interpretability.
arXiv Detail & Related papers (2023-09-22T02:46:43Z) - Learning Modulated Transformation in GANs [69.95217723100413]
We equip the generator in generative adversarial networks (GANs) with a plug-and-play module, termed as modulated transformation module (MTM)
MTM predicts spatial offsets under the control of latent codes, based on which the convolution operation can be applied at variable locations.
It is noteworthy that towards human generation on the challenging TaiChi dataset, we improve the FID of StyleGAN3 from 21.36 to 13.60, demonstrating the efficacy of learning modulated geometry transformation.
arXiv Detail & Related papers (2023-08-29T17:51:22Z) - ParGAN: Learning Real Parametrizable Transformations [50.51405390150066]
We propose ParGAN, a generalization of the cycle-consistent GAN framework to learn image transformations.
The proposed generator takes as input both an image and a parametrization of the transformation.
We show how, with disjoint image domains with no annotated parametrization, our framework can create smooths as well as learn multiple transformations simultaneously.
arXiv Detail & Related papers (2022-11-09T16:16:06Z) - Robust and Controllable Object-Centric Learning through Energy-based
Models [95.68748828339059]
ours is a conceptually simple and general approach to learning object-centric representations through an energy-based model.
We show that ours can be easily integrated into existing architectures and can effectively extract high-quality object-centric representations.
arXiv Detail & Related papers (2022-10-11T15:11:15Z) - Imaging with Equivariant Deep Learning [9.333799633608345]
We review the emerging field of equivariant imaging and show how it can provide improved generalization and new imaging opportunities.
We show the interplay between the acquisition physics and group actions and links to iterative reconstruction, blind compressed sensing and self-supervised learning.
arXiv Detail & Related papers (2022-09-05T02:13:57Z) - Transformation Coding: Simple Objectives for Equivariant Representations [17.544323284367927]
We present a non-generative approach to deep representation learning that seeks equivariant deep embedding through simple objectives.
In contrast to existing equivariant networks, our transformation coding approach does not constrain the choice of the feed-forward layer or the architecture.
arXiv Detail & Related papers (2022-02-19T01:43:13Z) - Quantised Transforming Auto-Encoders: Achieving Equivariance to
Arbitrary Transformations in Deep Networks [23.673155102696338]
Convolutional Neural Networks (CNNs) are equivariant to image translation.
We propose an auto-encoder architecture whose embedding obeys an arbitrary set of equivariance relations simultaneously.
We demonstrate results of successful re-rendering of transformed versions of input images on several datasets.
arXiv Detail & Related papers (2021-11-25T02:26:38Z) - Topographic VAEs learn Equivariant Capsules [84.33745072274942]
We introduce the Topographic VAE: a novel method for efficiently training deep generative models with topographically organized latent variables.
We show that such a model indeed learns to organize its activations according to salient characteristics such as digit class, width, and style on MNIST.
We demonstrate approximate equivariance to complex transformations, expanding upon the capabilities of existing group equivariant neural networks.
arXiv Detail & Related papers (2021-09-03T09:25:57Z) - Self-Supervised Representation Learning from Flow Equivariance [97.13056332559526]
We present a new self-supervised learning representation framework that can be directly deployed on a video stream of complex scenes.
Our representations, learned from high-resolution raw video, can be readily used for downstream tasks on static images.
arXiv Detail & Related papers (2021-01-16T23:44:09Z) - Disentangling images with Lie group transformations and sparse coding [3.3454373538792552]
We train a model that learns to disentangle spatial patterns and their continuous transformations in a completely unsupervised manner.
Training the model on a dataset consisting of controlled geometric transformations of specific MNIST digits shows that it can recover these transformations along with the digits.
arXiv Detail & Related papers (2020-12-11T19:11:32Z) - Generalizing Convolutional Neural Networks for Equivariance to Lie
Groups on Arbitrary Continuous Data [52.78581260260455]
We propose a general method to construct a convolutional layer that is equivariant to transformations from any specified Lie group.
We apply the same model architecture to images, ball-and-stick molecular data, and Hamiltonian dynamical systems.
arXiv Detail & Related papers (2020-02-25T17:40:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.