Related papers: A General Framework to Boost 3D GS Initialization for Text-to-3D Generation by Lexical Richness

A General Framework to Boost 3D GS Initialization for Text-to-3D Generation by Lexical Richness

URL: http://arxiv.org/abs/2408.01269v1
Date: Fri, 2 Aug 2024 13:46:15 GMT
Title: A General Framework to Boost 3D GS Initialization for Text-to-3D Generation by Lexical Richness
Authors: Lutao Jiang, Hangyu Li, Lin Wang,
Abstract summary: This paper proposes a novel framework to boost the 3D GS Initialization for text-to-3D generation upon the lexical richness. Our key idea is to aggregate 3D Gaussians into spatially uniform voxels to represent complex shapes. Our framework can be seamlessly plugged into SoTA training frameworks, e.g., LucidDreamer, for semantically consistent text-to-3D generation.
Score: 10.09002362480534
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-3D content creation has recently received much attention, especially with the prevalence of 3D Gaussians Splatting. In general, GS-based methods comprise two key stages: initialization and rendering optimization. To achieve initialization, existing works directly apply random sphere initialization or 3D diffusion models, e.g., Point-E, to derive the initial shapes. However, such strategies suffer from two critical yet challenging problems: 1) the final shapes are still similar to the initial ones even after training; 2) shapes can be produced only from simple texts, e.g., "a dog", not for lexically richer texts, e.g., "a dog is sitting on the top of the airplane". To address these problems, this paper proposes a novel general framework to boost the 3D GS Initialization for text-to-3D generation upon the lexical richness. Our key idea is to aggregate 3D Gaussians into spatially uniform voxels to represent complex shapes while enabling the spatial interaction among the 3D Gaussians and semantic interaction between Gaussians and texts. Specifically, we first construct a voxelized representation, where each voxel holds a 3D Gaussian with its position, scale, and rotation fixed while setting opacity as the sole factor to determine a position's occupancy. We then design an initialization network mainly consisting of two novel components: 1) Global Information Perception (GIP) block and 2) Gaussians-Text Fusion (GTF) block. Such a design enables each 3D Gaussian to assimilate the spatial information from other areas and semantic information from texts. Extensive experiments show the superiority of our framework of high-quality 3D GS initialization against the existing methods, e.g., Shap-E, by taking lexically simple, medium, and hard texts. Also, our framework can be seamlessly plugged into SoTA training frameworks, e.g., LucidDreamer, for semantically consistent text-to-3D generation.

Related papers

Generating Surface for Text-to-3D using 2D Gaussian Splatting [7.610379621632961]
We propose a novel method named DirectGaussian, which focuses on generating the surfaces of 3D objects represented by surfels.<n>In DirectGaussian, we utilize conditional text generation models and the surface of a 3D object is rendered by 2D Gaussian splatting.<n>Our framework is capable of achieving diverse and high-fidelity 3D content creation.
arXiv Detail & Related papers (2025-10-08T12:54:57Z)
UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding [65.60549881706959]
We introduce UniUGG, the first unified understanding and generation framework for 3D modalities.<n>Our framework employs an LLM to comprehend and decode sentences and 3D representations.<n>We propose a spatial decoder leveraging a latent diffusion model to generate high-quality 3D representations.
arXiv Detail & Related papers (2025-08-16T07:27:31Z)
UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting [68.37013525040891]
We propose UniGS, integrating 3D Gaussian Splatting (3DGS) into multi-modal pre-training to enhance the 3D representation. We demonstrate the effectiveness of UniGS in learning a more general and stronger aligned multi-modal representation.
arXiv Detail & Related papers (2025-02-25T05:10:22Z)
OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies [112.80292725951921]
textbfOVGaussian is a generalizable textbfOpen-textbfVocabulary 3D semantic segmentation framework based on the 3D textbfGaussian representation. We first construct a large-scale 3D scene dataset based on 3DGS, dubbed textbfSegGaussian, which provides detailed semantic and instance annotations for both Gaussian points and multi-view images. To promote semantic generalization across scenes, we introduce Generalizable Semantic Rasterization (GSR), which leverages a
arXiv Detail & Related papers (2024-12-31T07:55:35Z)
CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians [97.15119679296954]
CompGS is a novel generative framework that employs 3D Gaussian Splatting (GS) for efficient, compositional text-to-3D content generation. CompGS can be easily extended to controllable 3D editing, facilitating scene generation.
arXiv Detail & Related papers (2024-10-28T04:35:14Z)
DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data [50.164670363633704]
We present DIRECT-3D, a diffusion-based 3D generative model for creating high-quality 3D assets from text prompts. Our model is directly trained on extensive noisy and unaligned in-the-wild' 3D assets. We achieve state-of-the-art performance in both single-class generation and text-to-3D generation.
arXiv Detail & Related papers (2024-06-06T17:58:15Z)
DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling [23.06464506261766]
We present DreamScape, a method for creating highly consistent 3D scenes solely from textual descriptions. Our approach involves a 3D Gaussian Guide for scene representation, consisting of semantic primitives (objects) and their spatial transformations. A progressive scale control is tailored during local object generation, ensuring that objects of different sizes and densities adapt to the scene.
arXiv Detail & Related papers (2024-04-14T12:13:07Z)
GVGEN: Text-to-3D Generation with Volumetric Representation [89.55687129165256]
3D Gaussian splatting has emerged as a powerful technique for 3D reconstruction and generation, known for its fast and high-quality rendering capabilities. This paper introduces a novel diffusion-based framework, GVGEN, designed to efficiently generate 3D Gaussian representations from text input.
arXiv Detail & Related papers (2024-03-19T17:57:52Z)
BrightDreamer: Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis [10.151307760539071]
This paper presents BrightDreamer, an end-to-end feed-forward approach that can achieve generalizable and fast (77 ms) text-to-3D generation. We first propose a Text-guided Shape Deformation (TSD) network to predict the deformed shape and its new positions. We then design a novel Text-guided Triplane Generator (TTG) to generate a triplane representation for a 3D object.
arXiv Detail & Related papers (2024-03-17T17:04:45Z)
Hyper-3DG: Text-to-3D Gaussian Generation via Hypergraph [20.488040789522604]
We propose a method named 3D Gaussian Generation via Hypergraph (Hyper-3DG)'', designed to capture the sophisticated high-order correlations present within 3D objects. Our framework allows for the production of finely generated 3D objects within a cohesive optimization, effectively circumventing degradation.
arXiv Detail & Related papers (2024-03-14T09:59:55Z)
HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape Generation [55.95329424826433]
We propose HyperSDFusion, a dual-branch diffusion model that generates 3D shapes from a given text. We learn the hierarchical representations of text and 3D shapes in hyperbolic space. Our method is the first to explore the hyperbolic hierarchical representation for text-to-shape generation.
arXiv Detail & Related papers (2024-03-01T08:57:28Z)
GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting [52.150502668874495]
We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation. GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing.
arXiv Detail & Related papers (2024-02-11T13:40:08Z)
GS-CLIP: Gaussian Splatting for Contrastive Language-Image-3D Pretraining from Real-World Data [73.06536202251915]
3D Shape represented as point cloud has achieve advancements in multimodal pre-training to align image and language descriptions. We propose GS-CLIP for the first attempt to introduce 3DGS into multimodal pre-training to enhance 3D representation.
arXiv Detail & Related papers (2024-02-09T05:46:47Z)
Text-to-3D using Gaussian Splatting [18.163413810199234]
This paper proposes GSGEN, a novel method that adopts Gaussian Splatting, a recent state-of-the-art representation, to text-to-3D generation. GSGEN aims at generating high-quality 3D objects and addressing existing shortcomings by exploiting the explicit nature of Gaussian Splatting. Our approach can generate 3D assets with delicate details and accurate geometry.
arXiv Detail & Related papers (2023-09-28T16:44:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.