Learning to Generate Novel Scene Compositions from Single Images and
Videos
- URL: http://arxiv.org/abs/2105.05847v1
- Date: Wed, 12 May 2021 17:59:45 GMT
- Title: Learning to Generate Novel Scene Compositions from Single Images and
Videos
- Authors: Vadim Sushko, Juergen Gall, Anna Khoreva
- Abstract summary: One-Shot GAN learns to generate samples from a training set as little as one image or one video.
We propose a two-branch discriminator, with content and layout branches designed to judge the internal content separately from the scene layout realism.
- Score: 32.131955417610655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training GANs in low-data regimes remains a challenge, as overfitting often
leads to memorization or training divergence. In this work, we introduce
One-Shot GAN that can learn to generate samples from a training set as little
as one image or one video. We propose a two-branch discriminator, with content
and layout branches designed to judge the internal content separately from the
scene layout realism. This allows synthesis of visually plausible, novel
compositions of a scene, with varying content and layout, while preserving the
context of the original sample. Compared to previous single-image GAN models,
One-Shot GAN achieves higher diversity and quality of synthesis. It is also not
restricted to the single image setting, successfully learning in the introduced
setting of a single video.
Related papers
- Diversified in-domain synthesis with efficient fine-tuning for few-shot
classification [64.86872227580866]
Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class.
We propose DISEF, a novel approach which addresses the generalization challenge in few-shot learning using synthetic data.
We validate our method in ten different benchmarks, consistently outperforming baselines and establishing a new state-of-the-art for few-shot classification.
arXiv Detail & Related papers (2023-12-05T17:18:09Z) - SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation
for Novel View Synthesis from a Single Image [60.52991173059486]
We introduce SAMPLING, a Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image.
Our method demonstrates considerable performance gains in large-scale unbounded outdoor scenes using a single image on the KITTI dataset.
arXiv Detail & Related papers (2023-09-12T15:33:09Z) - Break-A-Scene: Extracting Multiple Concepts from a Single Image [80.47666266017207]
We introduce the task of textual scene decomposition.
We propose augmenting the input image with masks that indicate the presence of target concepts.
We then present a novel two-phase customization process.
arXiv Detail & Related papers (2023-05-25T17:59:04Z) - SceneComposer: Any-Level Semantic Image Synthesis [80.55876413285587]
We propose a new framework for conditional image synthesis from semantic layouts of any precision levels.
The framework naturally reduces to text-to-image (T2I) at the lowest level with no shape information, and it becomes segmentation-to-image (S2I) at the highest level.
We introduce several novel techniques to address the challenges coming with this new setup.
arXiv Detail & Related papers (2022-11-21T18:59:05Z) - Enhance Images as You Like with Unpaired Learning [8.104571453311442]
We propose a lightweight one-path conditional generative adversarial network (cGAN) to learn a one-to-many relation from low-light to normal-light image space.
Our network learns to generate a collection of enhanced images from a given input conditioned on various reference images.
Our model achieves competitive visual and quantitative results on par with fully supervised methods on both noisy and clean datasets.
arXiv Detail & Related papers (2021-10-04T03:00:44Z) - StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis [68.3787368024951]
We propose a novel approach for multi-modal Image-to-image (I2I) translation.
We learn a latent embedding, jointly with the generator, that models the variability of the output domain.
Specifically, we pre-train a generic style encoder using a novel proxy task to learn an embedding of images, from arbitrary domains, into a low-dimensional style latent space.
arXiv Detail & Related papers (2021-04-14T19:58:24Z) - Generating Novel Scene Compositions from Single Images and Videos [21.92417902229955]
We introduce SIV-GAN, an unconditional generative model that can generate new scene compositions from a single training image or a single video clip.
Compared to previous single image GANs, our model generates more diverse, higher quality images, while not being restricted to a single image setting.
arXiv Detail & Related papers (2021-03-24T17:59:07Z) - Unsupervised Novel View Synthesis from a Single Image [47.37120753568042]
Novel view synthesis from a single image aims at generating novel views from a single input image of an object.
This work aims at relaxing this assumption enabling training of conditional generative model for novel view synthesis in a completely unsupervised manner.
arXiv Detail & Related papers (2021-02-05T16:56:04Z) - Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single
Sample [107.76407209269236]
We introduce a novel patch-based variational autoencoder (VAE) which allows for a much greater diversity in generation.
At coarse scales, our patch-VAE is employed, ensuring samples are of high diversity.
At finer scales, a patch-GAN renders the fine details, resulting in high quality videos.
arXiv Detail & Related papers (2020-06-22T13:24:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.