Related papers: MultiDreamer3D: Multi-concept 3D Customization with Concept-Aware Diffusion Guidance

MultiDreamer3D: Multi-concept 3D Customization with Concept-Aware Diffusion Guidance

URL: http://arxiv.org/abs/2501.13449v1
Date: Thu, 23 Jan 2025 08:02:59 GMT
Title: MultiDreamer3D: Multi-concept 3D Customization with Concept-Aware Diffusion Guidance
Authors: Wooseok Song, Seunggyu Chang, Jaejun Yoo,
Abstract summary: MultiDreamer3D can generate coherent multi-concept 3D content in a divide-and-conquer manner.<n>We show that MultiDreamer3D not only ensures object presence and preserves the distinct identities of each concept but also successfully handles complex cases such as property change or interaction.
Score: 8.084345870645201
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While single-concept customization has been studied in 3D, multi-concept customization remains largely unexplored. To address this, we propose MultiDreamer3D that can generate coherent multi-concept 3D content in a divide-and-conquer manner. First, we generate 3D bounding boxes using an LLM-based layout controller. Next, a selective point cloud generator creates coarse point clouds for each concept. These point clouds are placed in the 3D bounding boxes and initialized into 3D Gaussian Splatting with concept labels, enabling precise identification of concept attributions in 2D projections. Finally, we refine 3D Gaussians via concept-aware interval score matching, guided by concept-aware diffusion. Our experimental results show that MultiDreamer3D not only ensures object presence and preserves the distinct identities of each concept but also successfully handles complex cases such as property change or interaction. To the best of our knowledge, we are the first to address the multi-concept customization in 3D.

Related papers

3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer [33.42183318484381]
We introduce 3D-LLaVA, a simple yet highly powerful 3D LMM designed to act as an intelligent assistant in comprehending, reasoning, and interacting with the 3D world. At the core of 3D-LLaVA is a new Omni Superpoint Transformer (OST), which integrates three functionalities.
arXiv Detail & Related papers (2025-01-02T09:33:13Z)
TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction [137.34863114016483]
TAR3D is a novel framework that consists of a 3D-aware Vector Quantized-Variational AutoEncoder (VQ-VAE) and a Generative Pre-trained Transformer (GPT) We show that TAR3D can achieve superior generation quality over existing methods in text-to-3D and image-to-3D tasks.
arXiv Detail & Related papers (2024-12-22T08:28:20Z)
GaussianAnything: Interactive Point Cloud Flow Matching For 3D Object Generation [75.39457097832113]
This paper introduces a novel 3D generation framework, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space. Our framework employs a Variational Autoencoder with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information. The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single image inputs.
arXiv Detail & Related papers (2024-11-12T18:59:32Z)
Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis [14.21719970175159]
Concept Conductor is designed to ensure visual fidelity and correct layout in multi-concept customization. We present a concept injection technique that employs shape-aware masks to specify the generation area for each concept. Our method supports the combination of any number of concepts and maintains high fidelity even when dealing with visually similar concepts.
arXiv Detail & Related papers (2024-08-07T08:43:58Z)
DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors [44.30208916019448]
We propose DreamDissector, a text-to-3D method capable of generating multiple independent objects with interactions. DreamDissector accepts a multi-object text-to-3D NeRF as input and produces independent textured meshes.
arXiv Detail & Related papers (2024-07-23T07:59:57Z)
ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars [62.34862776670368]
Real-world applications often require a large gallery of 3D assets that share a consistent theme. We present ThemeStation, a novel approach for theme-aware 3D-to-3D generation.
arXiv Detail & Related papers (2024-03-22T17:59:01Z)
Point Cloud Self-supervised Learning via 3D to Multi-view Masked Autoencoder [21.73287941143304]
Multi-Modality Masked AutoEncoders (MAE) methods leverage both 2D images and 3D point clouds for pre-training. We introduce a novel approach employing a 3D to multi-view masked autoencoder to fully harness the multi-modal attributes of 3D point clouds. Our method outperforms state-of-the-art counterparts by a large margin in a variety of downstream tasks.
arXiv Detail & Related papers (2023-11-17T22:10:03Z)
SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving [98.74706005223685]
3D scene understanding plays a vital role in vision-based autonomous driving. We propose a SurroundOcc method to predict the 3D occupancy with multi-camera images.
arXiv Detail & Related papers (2023-03-16T17:59:08Z)
ConceptFusion: Open-set Multimodal 3D Mapping [91.23054486724402]
ConceptFusion is a scene representation that is fundamentally open-set. It enables reasoning beyond a closed set of concepts and inherently multimodal. We evaluate ConceptFusion on a number of real-world datasets.
arXiv Detail & Related papers (2023-02-14T18:40:26Z)
3D Crowd Counting via Geometric Attention-guided Multi-View Fusion [50.520192402702015]
We propose to solve the multi-view crowd counting task through 3D feature fusion with 3D scene-level density maps. Compared to 2D fusion, the 3D fusion extracts more information of the people along the z-dimension (height), which helps to address the scale variations across multiple views. The 3D density maps still preserve the 2D density maps property that the sum is the count, while also providing 3D information about the crowd density.
arXiv Detail & Related papers (2020-03-18T11:35:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.