PerlDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models
- URL: http://arxiv.org/abs/2407.06109v2
- Date: Tue, 16 Jul 2024 14:05:17 GMT
- Title: PerlDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models
- Authors: Jinhua Zhang, Hualian Sheng, Sijia Cai, Bing Deng, Qiao Liang, Wen Li, Ying Fu, Jieping Ye, Shuhang Gu,
- Abstract summary: PerlDiff is a method for effective street view image generation that fully leverages perspective 3D geometric information.
Our results justify that our PerlDiff markedly enhances the precision of generation on the NuScenes and KITTI datasets.
- Score: 55.080748327139176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Controllable generation is considered a potentially vital approach to address the challenge of annotating 3D data, and the precision of such controllable generation becomes particularly imperative in the context of data production for autonomous driving. Existing methods focus on the integration of diverse generative information into controlling inputs, utilizing frameworks such as GLIGEN or ControlNet, to produce commendable outcomes in controllable generation. However, such approaches intrinsically restrict generation performance to the learning capacities of predefined network architectures. In this paper, we explore the integration of controlling information and introduce PerlDiff (Perspective-Layout Diffusion Models), a method for effective street view image generation that fully leverages perspective 3D geometric information. Our PerlDiff employs 3D geometric priors to guide the generation of street view images with precise object-level control within the network learning process, resulting in a more robust and controllable output. Moreover, it demonstrates superior controllability compared to alternative layout control methods. Empirical results justify that our PerlDiff markedly enhances the precision of generation on the NuScenes and KITTI datasets. Our codes and models are publicly available at https://github.com/LabShuHangGU/PerlDiff.
Related papers
- Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning [52.81032340916171]
Coin3D allows users to control the 3D generation using a coarse geometry proxy assembled from basic shapes.
Our method achieves superior controllability and flexibility in the 3D assets generation task.
arXiv Detail & Related papers (2024-05-13T17:56:13Z) - Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting [9.383423119196408]
We introduce Multi-view ControlNet (MVControl), a novel neural network architecture designed to enhance existing multi-view diffusion models.
MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation.
In pursuit of efficiency, we adopt 3D Gaussians as our representation instead of the commonly used implicit representations.
arXiv Detail & Related papers (2024-03-15T02:57:20Z) - Fine-grained Controllable Video Generation via Object Appearance and
Context [74.23066823064575]
We propose fine-grained controllable video generation (FACTOR) to achieve detailed control.
FACTOR aims to control objects' appearances and context, including their location and category.
Our method achieves controllability of object appearances without finetuning, which reduces the per-subject optimization efforts for the users.
arXiv Detail & Related papers (2023-12-05T17:47:33Z) - Controllable Data Generation Via Iterative Data-Property Mutual Mappings [13.282793266390316]
We propose a framework to enhance VAE-based data generators with property controllability and ensure disentanglement.
The proposed framework is implemented on four VAE-based controllable generators to evaluate its performance on property error, disentanglement, generation quality, and training time.
arXiv Detail & Related papers (2023-10-11T17:34:56Z) - Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance
Fields using Geometry-Guided Text-to-Image Diffusion Model [39.64952340472541]
We propose a controllable text-to-3D avatar generation method whose facial expression is controllable.
Our main strategy is to construct the 3D avatar in Neural Radiance Fields (NeRF) optimized with a set of controlled viewpoint-aware images.
We demonstrate the empirical results and discuss the effectiveness of our method.
arXiv Detail & Related papers (2023-09-07T08:14:46Z) - DragNUWA: Fine-grained Control in Video Generation by Integrating Text,
Image, and Trajectory [126.4597063554213]
DragNUWA is an open-domain diffusion-based video generation model.
It provides fine-grained control over video content from semantic, spatial, and temporal perspectives.
Our experiments validate the effectiveness of DragNUWA, demonstrating its superior performance in fine-grained control in video generation.
arXiv Detail & Related papers (2023-08-16T01:43:41Z) - Control3Diff: Learning Controllable 3D Diffusion Models from Single-view
Images [70.17085345196583]
Control3Diff is a 3D diffusion model that combines the strengths of diffusion models and 3D GANs for versatile, controllable 3D-aware image synthesis.
We validate the efficacy of Control3Diff on standard image generation benchmarks, including FFHQ, AFHQ, and ShapeNet.
arXiv Detail & Related papers (2023-04-13T17:52:29Z) - Towards a Neural Graphics Pipeline for Controllable Image Generation [96.11791992084551]
We present Neural Graphics Pipeline (NGP), a hybrid generative model that brings together neural and traditional image formation models.
NGP decomposes the image into a set of interpretable appearance feature maps, uncovering direct control handles for controllable image generation.
We demonstrate the effectiveness of our approach on controllable image generation of single-object scenes.
arXiv Detail & Related papers (2020-06-18T14:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.