ITS3D: Inference-Time Scaling for Text-Guided 3D Diffusion Models
- URL: http://arxiv.org/abs/2511.22456v1
- Date: Thu, 27 Nov 2025 13:46:16 GMT
- Title: ITS3D: Inference-Time Scaling for Text-Guided 3D Diffusion Models
- Authors: Zhenglin Zhou, Fan Ma, Xiaobo Xia, Hehe Fan, Yi Yang, Tat-Seng Chua,
- Abstract summary: ITS3D is a framework that formulates the task as an optimization problem to identify the most effective Gaussian noise input.<n>We introduce three techniques for improved stability, efficiency, and exploration capability.<n>Experiments demonstrate that ITS3D enhances text-to-3D generation quality.
- Score: 88.04431808574581
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We explore inference-time scaling in text-guided 3D diffusion models to enhance generative quality without additional training. To this end, we introduce ITS3D, a framework that formulates the task as an optimization problem to identify the most effective Gaussian noise input. The framework is driven by a verifier-guided search algorithm, where the search algorithm iteratively refines noise candidates based on verifier feedback. To address the inherent challenges of 3D generation, we introduce three techniques for improved stability, efficiency, and exploration capability. 1) Gaussian normalization is applied to stabilize the search process. It corrects distribution shifts when noise candidates deviate from a standard Gaussian distribution during iterative updates. 2) The high-dimensional nature of the 3D search space increases computational complexity. To mitigate this, a singular value decomposition-based compression technique is employed to reduce dimensionality while preserving effective search directions. 3) To further prevent convergence to suboptimal local minima, a singular space reset mechanism dynamically updates the search space based on diversity measures. Extensive experiments demonstrate that ITS3D enhances text-to-3D generation quality, which shows the potential of computationally efficient search methods in generative processes. The source code is available at https://github.com/ZhenglinZhou/ITS3D.
Related papers
- Tail-Aware Post-Training Quantization for 3D Geometry Models [58.79500829118265]
Post-Training Quantization (PTQ) enables efficient inference without retraining.<n>PTQ fails to transfer effectively to 3D models due to intricate feature distributions and prohibitive calibration overhead.<n>We propose TAPTQ, a Tail-Aware Post-Training Quantization pipeline for 3D geometric learning.
arXiv Detail & Related papers (2026-02-02T07:21:15Z) - TRIM: Scalable 3D Gaussian Diffusion Inference with Temporal and Spatial Trimming [10.73970270886881]
Recent advances in 3D Gaussian diffusion models suffer from time-intensive denoising and post-denoising processing.<n>We propose $textbfTRIM$ ($textbfT$rajectory $textbfR$eduction and $textbfI$nstance $textbfM$ask denoising.
arXiv Detail & Related papers (2025-11-20T18:49:09Z) - GaussianVAE: Adaptive Learning Dynamics of 3D Gaussians for High-Fidelity Super-Resolution [7.288410309484523]
We present a novel approach for enhancing the resolution and geometric fidelity of 3D Gaussian Splatting (3DGS) beyond native training resolution.<n>Our work breaks this limitation through a lightweight generative model that predicts and refines additional 3D Gaussians where needed most.
arXiv Detail & Related papers (2025-06-09T16:13:12Z) - GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors [14.743494200205754]
High-resolution novel view synthesis (HRNVS) from low-resolution input views is a challenging task due to the lack of high-resolution data.
Previous methods optimize high-resolution Neural Radiance Field (NeRF) from low-resolution input views but suffer from slow rendering speed.
In this work, we base our method on 3D Gaussian Splatting (3DGS) due to its capability of producing high-quality images at a faster rendering speed.
arXiv Detail & Related papers (2024-06-14T15:19:21Z) - R$^2$-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction [53.19869886963333]
3D Gaussian splatting (3DGS) has shown promising results in rendering image and surface reconstruction.
This paper introduces R2$-Gaussian, the first 3DGS-based framework for sparse-view tomographic reconstruction.
arXiv Detail & Related papers (2024-05-31T08:39:02Z) - HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting [113.37908093915837]
Existing methods optimize 3D representations like mesh or neural fields via score distillation sampling (SDS), which suffers from inadequate fine details or excessive training time.
In this paper, we propose an efficient yet effective framework, HumanGaussian, that generates high-quality 3D humans with fine-grained geometry and realistic appearance.
arXiv Detail & Related papers (2023-11-28T18:59:58Z) - DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation [55.661467968178066]
We propose DreamGaussian, a novel 3D content generation framework that achieves both efficiency and quality simultaneously.
Our key insight is to design a generative 3D Gaussian Splatting model with companioned mesh extraction and texture refinement in UV space.
In contrast to the occupancy pruning used in Neural Radiance Fields, we demonstrate that the progressive densification of 3D Gaussians converges significantly faster for 3D generative tasks.
arXiv Detail & Related papers (2023-09-28T17:55:05Z) - Text-to-3D using Gaussian Splatting [18.163413810199234]
This paper proposes GSGEN, a novel method that adopts Gaussian Splatting, a recent state-of-the-art representation, to text-to-3D generation.
GSGEN aims at generating high-quality 3D objects and addressing existing shortcomings by exploiting the explicit nature of Gaussian Splatting.
Our approach can generate 3D assets with delicate details and accurate geometry.
arXiv Detail & Related papers (2023-09-28T16:44:31Z) - Generalization of pixel-wise phase estimation by CNN and improvement of
phase-unwrapping by MRF optimization for one-shot 3D scan [0.621405559652172]
Active stereo technique using single pattern projection, a.k.a. one-shot 3D scan, have drawn a wide attention from industry, medical purposes, etc.
One severe drawback of one-shot 3D scan is sparse reconstruction.
We propose a pixel-wise technique for one-shot scan, which is applicable to any types of static pattern if the pattern is regular and periodic.
arXiv Detail & Related papers (2023-09-26T10:45:04Z) - State Entropy Maximization with Random Encoders for Efficient
Exploration [162.39202927681484]
Recent exploration methods have proven to be a recipe for improving sample-efficiency in deep reinforcement learning (RL)
This paper presents Randoms for Efficient Exploration (RE3), an exploration method that utilizes state entropy as an intrinsic reward.
In particular, we find that the state entropy can be estimated in a stable and compute-efficient manner by utilizing a randomly encoder.
arXiv Detail & Related papers (2021-02-18T15:45:17Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.