Generating Novel Scene Compositions from Single Images and Videos
- URL: http://arxiv.org/abs/2103.13389v5
- Date: Wed, 13 Dec 2023 13:44:40 GMT
- Title: Generating Novel Scene Compositions from Single Images and Videos
- Authors: Vadim Sushko, Dan Zhang, Juergen Gall, Anna Khoreva
- Abstract summary: We introduce SIV-GAN, an unconditional generative model that can generate new scene compositions from a single training image or a single video clip.
Compared to previous single image GANs, our model generates more diverse, higher quality images, while not being restricted to a single image setting.
- Score: 21.92417902229955
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a large dataset for training, generative adversarial networks (GANs)
can achieve remarkable performance for the image synthesis task. However,
training GANs in extremely low data regimes remains a challenge, as overfitting
often occurs, leading to memorization or training divergence. In this work, we
introduce SIV-GAN, an unconditional generative model that can generate new
scene compositions from a single training image or a single video clip. We
propose a two-branch discriminator architecture, with content and layout
branches designed to judge internal content and scene layout realism separately
from each other. This discriminator design enables synthesis of visually
plausible, novel compositions of a scene, with varying content and layout,
while preserving the context of the original sample. Compared to previous
single image GANs, our model generates more diverse, higher quality images,
while not being restricted to a single image setting. We further introduce a
new challenging task of learning from a few frames of a single video. In this
training setup the training images are highly similar to each other, which
makes it difficult for prior GAN models to achieve a synthesis of both high
quality and diversity.
Related papers
- MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image.
Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z) - SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation
for Novel View Synthesis from a Single Image [60.52991173059486]
We introduce SAMPLING, a Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image.
Our method demonstrates considerable performance gains in large-scale unbounded outdoor scenes using a single image on the KITTI dataset.
arXiv Detail & Related papers (2023-09-12T15:33:09Z) - Break-A-Scene: Extracting Multiple Concepts from a Single Image [80.47666266017207]
We introduce the task of textual scene decomposition.
We propose augmenting the input image with masks that indicate the presence of target concepts.
We then present a novel two-phase customization process.
arXiv Detail & Related papers (2023-05-25T17:59:04Z) - Multi-object Video Generation from Single Frame Layouts [84.55806837855846]
We propose a video generative framework capable of synthesizing global scenes with local objects.
Our framework is a non-trivial adaptation from image generation methods, and is new to this field.
Our model has been evaluated on two widely-used video recognition benchmarks.
arXiv Detail & Related papers (2023-05-06T09:07:01Z) - StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN [70.31913835035206]
We present a novel approach to the video synthesis problem that helps to greatly improve visual quality.
We make use of a pre-trained StyleGAN network, the latent space of which allows control over the appearance of the objects it was trained for.
Our temporal architecture is then trained not on sequences of RGB frames, but on sequences of StyleGAN latent codes.
arXiv Detail & Related papers (2021-07-15T09:58:15Z) - Semantic Palette: Guiding Scene Generation with Class Proportions [34.746963256847145]
We introduce a conditional framework with novel architecture designs and learning objectives, which effectively accommodates class proportions to guide the scene generation process.
Thanks to the semantic control, we can produce layouts close to the real distribution, helping enhance the whole scene generation process.
We demonstrate the merit of our approach for data augmentation: semantic segmenters trained on real layout-image pairs outperform models only trained on real pairs.
arXiv Detail & Related papers (2021-06-03T07:04:00Z) - Learning to Generate Novel Scene Compositions from Single Images and
Videos [32.131955417610655]
One-Shot GAN learns to generate samples from a training set as little as one image or one video.
We propose a two-branch discriminator, with content and layout branches designed to judge the internal content separately from the scene layout realism.
arXiv Detail & Related papers (2021-05-12T17:59:45Z) - StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis [68.3787368024951]
We propose a novel approach for multi-modal Image-to-image (I2I) translation.
We learn a latent embedding, jointly with the generator, that models the variability of the output domain.
Specifically, we pre-train a generic style encoder using a novel proxy task to learn an embedding of images, from arbitrary domains, into a low-dimensional style latent space.
arXiv Detail & Related papers (2021-04-14T19:58:24Z) - Unsupervised Novel View Synthesis from a Single Image [47.37120753568042]
Novel view synthesis from a single image aims at generating novel views from a single input image of an object.
This work aims at relaxing this assumption enabling training of conditional generative model for novel view synthesis in a completely unsupervised manner.
arXiv Detail & Related papers (2021-02-05T16:56:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.