BlobGAN: Spatially Disentangled Scene Representations
- URL: http://arxiv.org/abs/2205.02837v1
- Date: Thu, 5 May 2022 17:59:55 GMT
- Title: BlobGAN: Spatially Disentangled Scene Representations
- Authors: Dave Epstein, Taesung Park, Richard Zhang, Eli Shechtman, Alexei A.
Efros
- Abstract summary: We propose an unsupervised, mid-level representation for a generative model of scenes.
The representation is mid-level in that it is neither per-pixel nor per-image; rather, scenes are modeled as a collection of spatial, depth-ordered "blobs" of features.
- Score: 67.60387150586375
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose an unsupervised, mid-level representation for a generative model
of scenes. The representation is mid-level in that it is neither per-pixel nor
per-image; rather, scenes are modeled as a collection of spatial, depth-ordered
"blobs" of features. Blobs are differentiably placed onto a feature grid that
is decoded into an image by a generative adversarial network. Due to the
spatial uniformity of blobs and the locality inherent to convolution, our
network learns to associate different blobs with different entities in a scene
and to arrange these blobs to capture scene layout. We demonstrate this
emergent behavior by showing that, despite training without any supervision,
our method enables applications such as easy manipulation of objects within a
scene (e.g., moving, removing, and restyling furniture), creation of feasible
scenes given constraints (e.g., plausible rooms with drawers at a particular
location), and parsing of real-world images into constituent parts. On a
challenging multi-category dataset of indoor scenes, BlobGAN outperforms
StyleGAN2 in image quality as measured by FID. See our project page for video
results and interactive demo: http://www.dave.ml/blobgan
Related papers
- Moving Off-the-Grid: Scene-Grounded Video Representations [44.13534423774967]
We present Moving Off-the-Grid (MooG), a self-supervised video representation model.
MooG allows tokens to move "off-the-grid" to better enable them to represent scene elements consistently.
We show that MooG provides a strong foundation for different vision tasks when compared to "on-the-grid" baselines.
arXiv Detail & Related papers (2024-11-08T19:26:51Z) - Self-supervised Learning of Neural Implicit Feature Fields for Camera Pose Refinement [32.335953514942474]
This paper proposes to jointly learn the scene representation along with a 3D dense feature field and a 2D feature extractor.
We learn the underlying geometry of the scene with an implicit field through volumetric rendering and design our feature field to leverage intermediate geometric information encoded in the implicit field.
Visual localization is then achieved by aligning the image-based features and the rendered volumetric features.
arXiv Detail & Related papers (2024-06-12T17:51:53Z) - Move Anything with Layered Scene Diffusion [77.45870343845492]
We propose SceneDiffusion to optimize a layered scene representation during the diffusion sampling process.
Our key insight is that spatial disentanglement can be obtained by jointly denoising scene renderings at different spatial layouts.
Our generated scenes support a wide range of spatial editing operations, including moving, resizing, cloning, and layer-wise appearance editing operations.
arXiv Detail & Related papers (2024-04-10T17:28:16Z) - Disentangled 3D Scene Generation with Layout Learning [109.03233745767062]
We introduce a method to generate 3D scenes that are disentangled into their component objects.
Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene.
We show that despite its simplicity, our approach successfully generates 3D scenes into individual objects.
arXiv Detail & Related papers (2024-02-26T18:54:15Z) - Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators [97.12135238534628]
We propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects.
Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts.
Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks.
arXiv Detail & Related papers (2022-12-13T01:36:56Z) - Hyperbolic Contrastive Learning for Visual Representations beyond
Objects [30.618032825306187]
We focus on learning representations for objects and scenes that preserve the structure among them.
Motivated by the observation that visually similar objects are close in the representation space, we argue that the scenes and objects should instead follow a hierarchical structure.
arXiv Detail & Related papers (2022-12-01T16:58:57Z) - MeshLoc: Mesh-Based Visual Localization [54.731309449883284]
We explore a more flexible alternative based on dense 3D meshes that does not require features matching between database images to build the scene representation.
Surprisingly competitive results can be obtained when extracting features on renderings of these meshes, without any neural rendering stage.
Our results show that dense 3D model-based representations are a promising alternative to existing representations and point to interesting and challenging directions for future research.
arXiv Detail & Related papers (2022-07-21T21:21:10Z) - Playable Environments: Video Manipulation in Space and Time [98.0621309257937]
We present Playable Environments - a new representation for interactive video generation and manipulation in space and time.
With a single image at inference time, our novel framework allows the user to move objects in 3D while generating a video by providing a sequence of desired actions.
Our method builds an environment state for each frame, which can be manipulated by our proposed action module and decoded back to the image space with volumetric rendering.
arXiv Detail & Related papers (2022-03-03T18:51:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.