Learning to Manipulate Individual Objects in an Image
- URL: http://arxiv.org/abs/2004.05495v1
- Date: Sat, 11 Apr 2020 21:50:20 GMT
- Title: Learning to Manipulate Individual Objects in an Image
- Authors: Yanchao Yang, Yutong Chen and Stefano Soatto
- Abstract summary: We describe a method to train a generative model with latent factors that are independent and localized.
This means that perturbing the latent variables affects only local regions of the synthesized image, corresponding to objects.
Unlike other unsupervised generative models, ours enables object-centric manipulation, without requiring object-level annotations.
- Score: 71.55005356240761
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We describe a method to train a generative model with latent factors that are
(approximately) independent and localized. This means that perturbing the
latent variables affects only local regions of the synthesized image,
corresponding to objects. Unlike other unsupervised generative models, ours
enables object-centric manipulation, without requiring object-level
annotations, or any form of annotation for that matter. The key to our method
is the combination of spatial disentanglement, enforced by a Contextual
Information Separation loss, and perceptual cycle-consistency, enforced by a
loss that penalizes changes in the image partition in response to perturbations
of the latent factors. We test our method's ability to allow independent
control of spatial and semantic factors of variability on existing datasets and
also introduce two new ones that highlight the limitations of current methods.
Related papers
- Disentanglement of Latent Representations via Causal Interventions [11.238098505498165]
We introduce a new method for disentanglement inspired by causal dynamics.
Our model considers the quantized vectors as causal variables and links them in a causal graph.
It performs causal interventions on the graph and generates atomic transitions affecting a unique factor of variation in the image.
arXiv Detail & Related papers (2023-02-02T04:37:29Z) - Self-Supervised Video Object Segmentation via Cutout Prediction and
Tagging [117.73967303377381]
We propose a novel self-supervised Video Object (VOS) approach that strives to achieve better object-background discriminability.
Our approach is based on a discriminative learning loss formulation that takes into account both object and background information.
Our proposed approach, CT-VOS, achieves state-of-the-art results on two challenging benchmarks: DAVIS-2017 and Youtube-VOS.
arXiv Detail & Related papers (2022-04-22T17:53:27Z) - Learning Conditional Invariance through Cycle Consistency [60.85059977904014]
We propose a novel approach to identify meaningful and independent factors of variation in a dataset.
Our method involves two separate latent subspaces for the target property and the remaining input information.
We demonstrate on synthetic and molecular data that our approach identifies more meaningful factors which lead to sparser and more interpretable models.
arXiv Detail & Related papers (2021-11-25T17:33:12Z) - Mitigating Generation Shifts for Generalized Zero-Shot Learning [52.98182124310114]
Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic information (e.g., attributes) to recognize the seen and unseen samples, where unseen classes are not observable during training.
We propose a novel Generation Shifts Mitigating Flow framework for learning unseen data synthesis efficiently and effectively.
Experimental results demonstrate that GSMFlow achieves state-of-the-art recognition performance in both conventional and generalized zero-shot settings.
arXiv Detail & Related papers (2021-07-07T11:43:59Z) - An Image is Worth More Than a Thousand Words: Towards Disentanglement in
the Wild [34.505472771669744]
Unsupervised disentanglement has been shown to be theoretically impossible without inductive biases on the models and the data.
We propose a method for disentangling a set of factors which are only partially labeled, as well as separating the complementary set of residual factors.
arXiv Detail & Related papers (2021-06-29T17:54:24Z) - Where and What? Examining Interpretable Disentangled Representations [96.32813624341833]
Capturing interpretable variations has long been one of the goals in disentanglement learning.
Unlike the independence assumption, interpretability has rarely been exploited to encourage disentanglement in the unsupervised setting.
In this paper, we examine the interpretability of disentangled representations by investigating two questions: where to be interpreted and what to be interpreted.
arXiv Detail & Related papers (2021-04-07T11:22:02Z) - Rethinking Content and Style: Exploring Bias for Unsupervised
Disentanglement [59.033559925639075]
We propose a formulation for unsupervised C-S disentanglement based on our assumption that different factors are of different importance and popularity for image reconstruction.
The corresponding model inductive bias is introduced by our proposed C-S disentanglement Module (C-S DisMo)
Experiments on several popular datasets demonstrate that our method achieves the state-of-the-art unsupervised C-S disentanglement.
arXiv Detail & Related papers (2021-02-21T08:04:33Z) - NestedVAE: Isolating Common Factors via Weak Supervision [45.366986365879505]
We identify the connection between the task of bias reduction and that of isolating factors common between domains.
To isolate the common factors we combine the theory of deep latent variable models with information bottleneck theory.
Two outer VAEs with shared weights attempt to reconstruct the input and infer a latent space, whilst a nested VAE attempts to reconstruct the latent representation of one image, from the latent representation of its paired image.
arXiv Detail & Related papers (2020-02-26T15:49:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.