Related papers: Structural Energy-Guided Sampling for View-Consistent Text-to-3D

Structural Energy-Guided Sampling for View-Consistent Text-to-3D

URL: http://arxiv.org/abs/2508.16917v1
Date: Sat, 23 Aug 2025 06:26:04 GMT
Title: Structural Energy-Guided Sampling for View-Consistent Text-to-3D
Authors: Qing Zhang, Jinguang Tong, Jie Hong, Jing Zhang, Xuesong Li,
Abstract summary: Text-to-3D generation often suffers from the Janus problem, where objects collapse into duplicated or distorted geometry from other angles.<n>We propose Structural Energy-Guided Sampling (SEGS), a training-free, plug-and-play framework that enforces multi-view consistency entirely at sampling time.
Score: 18.973527029488746
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-3D generation often suffers from the Janus problem, where objects look correct from the front but collapse into duplicated or distorted geometry from other angles. We attribute this failure to viewpoint bias in 2D diffusion priors, which propagates into 3D optimization. To address this, we propose Structural Energy-Guided Sampling (SEGS), a training-free, plug-and-play framework that enforces multi-view consistency entirely at sampling time. SEGS defines a structural energy in a PCA subspace of intermediate U-Net features and injects its gradients into the denoising trajectory, steering geometry toward the intended viewpoint while preserving appearance fidelity. Integrated seamlessly into SDS/VSD pipelines, SEGS significantly reduces Janus artifacts, achieving improved geometric alignment and viewpoint consistency without retraining or weight modification.

Related papers

ERGO: Excess-Risk-Guided Optimization for High-Fidelity Monocular 3D Gaussian Splatting [63.138778159026934]
We propose an adaptive optimization framework guided by excess risk decomposition, termed ERGO.<n> ERGO dynamically estimates the view-specific excess risk and adaptively adjust loss weights during optimization.<n>Experiments on the Google Scanned Objects dataset and the OmniObject3D dataset demonstrate the superiority of ERGO over existing state-of-the-art methods.
arXiv Detail & Related papers (2026-02-10T20:44:43Z)
Joint Geometry-Appearance Human Reconstruction in a Unified Latent Space via Bridge Diffusion [57.09673862519791]
This paper introduces textbfJGA-LBD, a novel framework that unifies the modeling of geometry and appearance into a joint latent representation.<n> Experiments demonstrate that JGA-LBD outperforms current state-of-the-art approaches in terms of both geometry fidelity and appearance quality.
arXiv Detail & Related papers (2026-01-01T12:48:56Z)
COREA: Coarse-to-Fine 3D Representation Alignment Between Relightable 3D Gaussians and SDF via Bidirectional 3D-to-3D Supervision [15.632917458525851]
We present COREA, the first unified framework that jointly learns relightable 3D Gaussians and a Signed Distance Field (SDF) for accurate geometry reconstruction and faithful relighting.<n>Experiments on standard benchmarks demonstrate that COREA achieves superior performance in novel-view synthesis, mesh reconstruction, and PBR within a unified framework.
arXiv Detail & Related papers (2025-12-08T02:41:42Z)
FaithFusion: Harmonizing Reconstruction and Generation via Pixel-wise Information Gain [17.131480990824397]
We introduce a 3DGS-diffusion framework driven by pixel-wise Information Gain (EIG)<n>EIG acts as a unified policy for coherent controllable-text synthesis.<n>Experiments on a dataset demonstrate that our approach attains SOTA across NTA-oU, NTLI-oU, and FID, maintaining an FID of 107.47 even at 6 meters lane shift.
arXiv Detail & Related papers (2025-11-26T06:58:57Z)
SPAN: Spatial-Projection Alignment for Monocular 3D Object Detection [49.12928389918159]
Existing monocular 3D detectors typically tame the pronounced nonlinear regression of 3D bounding box through decoupled prediction paradigm.<n>We propose novel Spatial-Projection Alignment (SPAN) with two pivotal components.<n>SPAN enforces an explicit global spatial constraint between the predicted and ground-truth 3D bounding boxes, thereby rectifying spatial drift caused by decoupled attribute regression.<n>3D-2D Projection Alignment ensures that the projected 3D box is aligned tightly within its corresponding 2D detection bounding box on the image plane, mitigating projection misalignment overlooked in previous works.
arXiv Detail & Related papers (2025-11-10T04:48:48Z)
Visibility-Aware Densification for 3D Gaussian Splatting in Dynamic Urban Scenes [7.253732091582086]
VAD-GS is a 3DGS framework tailored for geometry recovery in challenging urban scenes.<n>Our method identifies unreliable geometry structures via voxel-based visibility reasoning.<n>It selects informative supporting views through diversity-aware view selection, and recovers missing structures via patch matching-based stereo reconstruction.
arXiv Detail & Related papers (2025-10-10T13:22:12Z)
Accurate and Complete Surface Reconstruction from 3D Gaussians via Direct SDF Learning [5.604709769018076]
3D Gaussian Splatting (3DGS) has emerged as a powerful paradigm for photorealistic view synthesis.<n>We propose DiGS, a unified framework that embeds Signed Distance Field (SDF) learning directly into the 3DGS pipeline.<n>We show that DiGS consistently improves reconstruction accuracy and completeness while retaining high fidelity.
arXiv Detail & Related papers (2025-09-09T08:17:46Z)
Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis [22.767866875051013]
We propose GS-Diff, a novel 3DGS framework guided by a multi-view diffusion model to address limitations of current methods.<n>By generating pseudo-observations conditioned on multi-view inputs, our method transforms under-constrained 3D reconstruction problems into well-posed ones.<n> Experiments on four benchmarks demonstrate that GS-Diff consistently outperforms state-of-the-art baselines by significant margins.
arXiv Detail & Related papers (2025-04-02T17:59:46Z)
StableGS: A Floater-Free Framework for 3D Gaussian Splatting [9.935869165752283]
3D Gaussian Splatting (3DGS) reconstructions are plagued by stubborn floater" artifacts that degrade their geometric and visual fidelity.<n>We propose StableGS, a novel framework that decouples geometric regularization from final appearance rendering.<n> Experiments on multiple benchmarks show StableGS not only eliminates floaters but also resolves the common blur-artifact trade-off.
arXiv Detail & Related papers (2025-03-24T09:02:51Z)
Decompositional Neural Scene Reconstruction with Generative Diffusion Prior [64.71091831762214]
Decompositional reconstruction of 3D scenes, with complete shapes and detailed texture, is intriguing for downstream applications.<n>Recent approaches incorporate semantic or geometric regularization to address this issue, but they suffer significant degradation in underconstrained areas.<n>We propose DP-Recon, which employs diffusion priors in the form of Score Distillation Sampling (SDS) to optimize the neural representation of each individual object under novel views.
arXiv Detail & Related papers (2025-03-19T02:11:31Z)
PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.<n>Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z)
Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation [27.43973967994717]
MT3D is a text-to-3D generative model that leverages a high-fidelity 3D object to overcome viewpoint bias.<n>By incorporating geometric details from a 3D asset, MT3D enables the creation of diverse and geometrically consistent objects.
arXiv Detail & Related papers (2024-08-12T06:25:44Z)
GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.<n>Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z)
High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views. Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.