Interactive Scene Authoring with Specialized Generative Primitives
- URL: http://arxiv.org/abs/2412.16253v1
- Date: Fri, 20 Dec 2024 04:39:50 GMT
- Title: Interactive Scene Authoring with Specialized Generative Primitives
- Authors: Clément Jambon, Changwoon Choi, Dongsu Zhang, Olga Sorkine-Hornung, Young Min Kim,
- Abstract summary: Specialized Generative Primitives is a generative framework that allows non-expert users to author high-quality 3D scenes.
Each primitive is an efficient generative model that captures the distribution of a single exemplar from the real world.
We showcase interactive sessions where various primitives are extracted from real-world scenes and controlled to create 3D assets and scenes in a few minutes.
- Score: 25.378818867764323
- License:
- Abstract: Generating high-quality 3D digital assets often requires expert knowledge of complex design tools. We introduce Specialized Generative Primitives, a generative framework that allows non-expert users to author high-quality 3D scenes in a seamless, lightweight, and controllable manner. Each primitive is an efficient generative model that captures the distribution of a single exemplar from the real world. With our framework, users capture a video of an environment, which we turn into a high-quality and explicit appearance model thanks to 3D Gaussian Splatting. Users then select regions of interest guided by semantically-aware features. To create a generative primitive, we adapt Generative Cellular Automata to single-exemplar training and controllable generation. We decouple the generative task from the appearance model by operating on sparse voxels and we recover a high-quality output with a subsequent sparse patch consistency step. Each primitive can be trained within 10 minutes and used to author new scenes interactively in a fully compositional manner. We showcase interactive sessions where various primitives are extracted from real-world scenes and controlled to create 3D assets and scenes in a few minutes. We also demonstrate additional capabilities of our primitives: handling various 3D representations to control generation, transferring appearances, and editing geometries.
Related papers
- InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models [75.03495065452955]
We present InfiniCube, a scalable method for generating dynamic 3D driving scenes with high fidelity and controllability.
Our method can generate controllable and realistic 3D driving scenes, and extensive experiments validate the effectiveness and superiority of our model.
arXiv Detail & Related papers (2024-12-05T07:32:20Z) - StdGEN: Semantic-Decomposed 3D Character Generation from Single Images [28.302030751098354]
StdGEN is an innovative pipeline for generating semantically high-quality 3D characters from single images.
It generates intricately detailed 3D characters with separated semantic components such as the body, clothes, and hair, in three minutes.
StdGEN offers ready-to-use semantic-decomposed 3D characters and enables flexible customization for a wide range of applications.
arXiv Detail & Related papers (2024-11-08T17:54:18Z) - Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation [61.040832373015014]
We propose Flex3D, a novel framework for generating high-quality 3D content from text, single images, or sparse view images.
We employ a fine-tuned multi-view image diffusion model and a video diffusion model to generate a pool of candidate views, enabling a rich representation of the target 3D object.
In the second stage, the curated views are fed into a Flexible Reconstruction Model (FlexRM), built upon a transformer architecture that can effectively process an arbitrary number of inputs.
arXiv Detail & Related papers (2024-10-01T17:29:43Z) - ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis [63.169364481672915]
We propose textbfViewCrafter, a novel method for synthesizing high-fidelity novel views of generic scenes from single or sparse images.
Our method takes advantage of the powerful generation capabilities of video diffusion model and the coarse 3D clues offered by point-based representation to generate high-quality video frames.
arXiv Detail & Related papers (2024-09-03T16:53:19Z) - CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets [43.315487682462845]
CLAY is a 3D geometry and material generator designed to transform human imagination into intricate 3D digital structures.
At its core is a large-scale generative model composed of a multi-resolution Variational Autoencoder (VAE) and a minimalistic latent Diffusion Transformer (DiT)
We demonstrate using CLAY for a range of controllable 3D asset creations, from sketchy conceptual designs to production ready assets with intricate details.
arXiv Detail & Related papers (2024-05-30T05:57:36Z) - InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models [66.83681825842135]
InstantMesh is a feed-forward framework for instant 3D mesh generation from a single image.
It features state-of-the-art generation quality and significant training scalability.
We release all the code, weights, and demo of InstantMesh with the intention that it can make substantial contributions to the community of 3D generative AI.
arXiv Detail & Related papers (2024-04-10T17:48:37Z) - CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization [27.55341255800119]
We present CharacterGen, a framework developed to efficiently generate 3D characters.
A transformer-based, generalizable sparse-view reconstruction model is the other core component of our approach.
We have curated a dataset of anime characters, rendered in multiple poses and views, to train and evaluate our model.
arXiv Detail & Related papers (2024-02-27T05:10:59Z) - Patch-based 3D Natural Scene Generation from a Single Example [35.37200601332951]
We target a 3D generative model for general natural scenes that are typically unique and intricate.
Inspired by classical patch-based image models, we advocate for synthesizing 3D scenes at the patch level, given a single example.
arXiv Detail & Related papers (2023-04-25T09:19:11Z) - NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion
Models [85.20004959780132]
We introduce NeuralField-LDM, a generative model capable of synthesizing complex 3D environments.
We show how NeuralField-LDM can be used for a variety of 3D content creation applications, including conditional scene generation, scene inpainting and scene style manipulation.
arXiv Detail & Related papers (2023-04-19T16:13:21Z) - SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation [89.47132156950194]
We present a novel framework built to simplify 3D asset generation for amateur users.
Our method supports a variety of input modalities that can be easily provided by a human.
Our model can combine all these tasks into one swiss-army-knife tool.
arXiv Detail & Related papers (2022-12-08T18:59:05Z) - Video-driven Neural Physically-based Facial Asset for Production [33.24654834163312]
We present a new learning-based, video-driven approach for generating dynamic facial geometries with high-quality physically-based assets.
Our technique provides higher accuracy and visual fidelity than previous video-driven facial reconstruction and animation methods.
arXiv Detail & Related papers (2022-02-11T13:22:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.