Unsupervised Disentanglement of Pose, Appearance and Background from
Images and Videos
- URL: http://arxiv.org/abs/2001.09518v1
- Date: Sun, 26 Jan 2020 20:59:47 GMT
- Title: Unsupervised Disentanglement of Pose, Appearance and Background from
Images and Videos
- Authors: Aysegul Dundar, Kevin J. Shih, Animesh Garg, Robert Pottorf, Andrew
Tao, Bryan Catanzaro
- Abstract summary: Unsupervised landmark learning is the task of learning semantic keypoint-like representations without the use of expensive input keypoint-level annotations.
A popular approach is to factorize an image into a pose and appearance data stream, then to reconstruct the image from the factorized components.
This work explores the effects of factorizing the reconstruction task into separate foreground and background reconstructions.
- Score: 44.93648211794362
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised landmark learning is the task of learning semantic keypoint-like
representations without the use of expensive input keypoint-level annotations.
A popular approach is to factorize an image into a pose and appearance data
stream, then to reconstruct the image from the factorized components. The pose
representation should capture a set of consistent and tightly localized
landmarks in order to facilitate reconstruction of the input image. Ultimately,
we wish for our learned landmarks to focus on the foreground object of
interest. However, the reconstruction task of the entire image forces the model
to allocate landmarks to model the background. This work explores the effects
of factorizing the reconstruction task into separate foreground and background
reconstructions, conditioning only the foreground reconstruction on the
unsupervised landmarks. Our experiments demonstrate that the proposed
factorization results in landmarks that are focused on the foreground object of
interest. Furthermore, the rendered background quality is also improved, as the
background rendering pipeline no longer requires the ill-suited landmarks to
model its pose and appearance. We demonstrate this improvement in the context
of the video-prediction task.
Related papers
- DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - Painterly Image Harmonization via Adversarial Residual Learning [37.78751164466694]
painterly image aims to transfer the style of background painting to the foreground object.
In this work, we employ adversarial learning to bridge the domain gap between foreground feature map and background feature map.
arXiv Detail & Related papers (2023-11-15T01:53:46Z) - A Fusion of Variational Distribution Priors and Saliency Map Replay for
Continual 3D Reconstruction [1.3812010983144802]
Single-image 3D reconstruction is a research challenge focused on predicting 3D object shapes from single-view images.
This task requires significant data acquisition to predict both visible and occluded portions of the shape.
We propose a continual learning-based 3D reconstruction method where our goal is to design a model using Variational Priors that can still reconstruct the previously seen classes reasonably even after training on new classes.
arXiv Detail & Related papers (2023-08-17T06:48:55Z) - Revisiting Image Reconstruction for Semi-supervised Semantic
Segmentation [16.27277238968567]
We revisit the idea of using image reconstruction as an auxiliary task and incorporate it with a modern semi-supervised semantic segmentation framework.
Surprisingly, we discover that such an old idea in semi-supervised learning can produce results competitive with state-of-the-art semantic segmentation algorithms.
arXiv Detail & Related papers (2023-03-17T06:31:06Z) - Take a Prior from Other Tasks for Severe Blur Removal [52.380201909782684]
Cross-level feature learning strategy based on knowledge distillation to learn the priors.
Semantic prior embedding layer with multi-level aggregation and semantic attention transformation to integrate the priors effectively.
Experiments on natural image deblurring benchmarks and real-world images, such as GoPro and RealBlur datasets, demonstrate our method's effectiveness and ability.
arXiv Detail & Related papers (2023-02-14T08:30:51Z) - Unsupervised Part Discovery from Contrastive Reconstruction [90.88501867321573]
The goal of self-supervised visual representation learning is to learn strong, transferable image representations.
We propose an unsupervised approach to object part discovery and segmentation.
Our method yields semantic parts consistent across fine-grained but visually distinct categories.
arXiv Detail & Related papers (2021-11-11T17:59:42Z) - Self-supervised Segmentation via Background Inpainting [96.10971980098196]
We introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving camera.
We exploit a self-supervised loss function that we exploit to train a proposal-based segmentation network.
We apply our method to human detection and segmentation in images that visually depart from those of standard benchmarks and outperform existing self-supervised methods.
arXiv Detail & Related papers (2020-11-11T08:34:40Z) - Unsupervised Learning of Landmarks based on Inter-Intra Subject
Consistencies [72.67344725725961]
We present a novel unsupervised learning approach to image landmark discovery by incorporating the inter-subject landmark consistencies on facial images.
This is achieved via an inter-subject mapping module that transforms original subject landmarks based on an auxiliary subject-related structure.
To recover from the transformed images back to the original subject, the landmark detector is forced to learn spatial locations that contain the consistent semantic meanings both for the paired intra-subject images and between the paired inter-subject images.
arXiv Detail & Related papers (2020-04-16T20:38:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.