FaithFusion: Harmonizing Reconstruction and Generation via Pixel-wise Information Gain
- URL: http://arxiv.org/abs/2511.21113v1
- Date: Wed, 26 Nov 2025 06:58:57 GMT
- Title: FaithFusion: Harmonizing Reconstruction and Generation via Pixel-wise Information Gain
- Authors: YuAn Wang, Xiaofan Li, Chi Huang, Wenhao Zhang, Hao Li, Bosheng Wang, Xun Sun, Jun Wang,
- Abstract summary: We introduce a 3DGS-diffusion framework driven by pixel-wise Information Gain (EIG)<n>EIG acts as a unified policy for coherent controllable-text synthesis.<n>Experiments on a dataset demonstrate that our approach attains SOTA across NTA-oU, NTLI-oU, and FID, maintaining an FID of 107.47 even at 6 meters lane shift.
- Score: 17.131480990824397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In controllable driving-scene reconstruction and 3D scene generation, maintaining geometric fidelity while synthesizing visually plausible appearance under large viewpoint shifts is crucial. However, effective fusion of geometry-based 3DGS and appearance-driven diffusion models faces inherent challenges, as the absence of pixel-wise, 3D-consistent editing criteria often leads to over-restoration and geometric drift. To address these issues, we introduce \textbf{FaithFusion}, a 3DGS-diffusion fusion framework driven by pixel-wise Expected Information Gain (EIG). EIG acts as a unified policy for coherent spatio-temporal synthesis: it guides diffusion as a spatial prior to refine high-uncertainty regions, while its pixel-level weighting distills the edits back into 3DGS. The resulting plug-and-play system is free from extra prior conditions and structural modifications.Extensive experiments on the Waymo dataset demonstrate that our approach attains SOTA performance across NTA-IoU, NTL-IoU, and FID, maintaining an FID of 107.47 even at 6 meters lane shift. Our code is available at https://github.com/wangyuanbiubiubiu/FaithFusion.
Related papers
- Scalable Adaptation of 3D Geometric Foundation Models via Weak Supervision from Internet Video [76.32954467706581]
We propose SAGE, a framework for Scalable Adaptation of GEometric foundation models from raw video streams.<n>We use a hierarchical mining pipeline to transform videos into training trajectories and hybrid supervision.<n>Experiments show that SAGE significantly enhances zero-shot generalization, reducing Chamfer Distance by 20-42% on unseen benchmarks.
arXiv Detail & Related papers (2026-02-08T09:53:21Z) - Geometry-Aware Rotary Position Embedding for Consistent Video World Model [48.914346802616414]
ViewRope is a geometry-aware encoding that injects camera-ray directions directly into video transformer self-attention layers.<n>Geometry-Aware Frame-Sparse Attention exploits these geometric cues to selectively attend to relevant historical frames.<n>Our results demonstrate that ViewRope substantially improves long-term consistency while reducing computational costs.
arXiv Detail & Related papers (2026-02-08T08:01:16Z) - Joint Geometry-Appearance Human Reconstruction in a Unified Latent Space via Bridge Diffusion [57.09673862519791]
This paper introduces textbfJGA-LBD, a novel framework that unifies the modeling of geometry and appearance into a joint latent representation.<n> Experiments demonstrate that JGA-LBD outperforms current state-of-the-art approaches in terms of both geometry fidelity and appearance quality.
arXiv Detail & Related papers (2026-01-01T12:48:56Z) - TALO: Pushing 3D Vision Foundation Models Towards Globally Consistent Online Reconstruction [57.46712611558817]
3D vision foundation models have shown strong generalization in reconstructing key 3D attributes from uncalibrated images through a single feed-forward pass.<n>Recent strategies align consecutive predictions by solving global transformation, yet our analysis reveals their fundamental limitations in assumption validity, local alignment scope, and robustness under noisy geometry.<n>We propose a higher-DOF and long-term alignment framework based on Thin Plate Spline, leveraging globally propagated control points to correct spatially varying inconsistencies.
arXiv Detail & Related papers (2025-12-02T02:22:20Z) - SaLon3R: Structure-aware Long-term Generalizable 3D Reconstruction from Unposed Images [31.94503176488054]
SaLon3R is a novel framework for Structure-aware, Long-term 3DGS Reconstruction.<n>It is capable of reconstructing over 50 views in over 10 FPS, with 50% to 90% redundancy removal.<n>Our approach effectively resolves artifacts and prunes the redundant 3DGS in a single feed-forward pass.
arXiv Detail & Related papers (2025-10-16T18:37:10Z) - RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation [75.61028930882144]
We identify and quantify this critical issue, demonstrating a significant performance gap in 3D object detection when using synthetic versus real data.<n>We introduce Reinforcement Learning with Geometric Feedback (RLGF), RLGF uniquely refines video diffusion models by incorporating rewards from specialized latent-space AD perception models.<n> RLGF substantially reduces geometric errors (e.g., VP error by 21%, Depth error by 57%) and dramatically improves 3D object detection mAP by 12.7%, narrowing the gap to real-data performance.
arXiv Detail & Related papers (2025-09-20T02:23:36Z) - Accurate and Complete Surface Reconstruction from 3D Gaussians via Direct SDF Learning [5.604709769018076]
3D Gaussian Splatting (3DGS) has emerged as a powerful paradigm for photorealistic view synthesis.<n>We propose DiGS, a unified framework that embeds Signed Distance Field (SDF) learning directly into the 3DGS pipeline.<n>We show that DiGS consistently improves reconstruction accuracy and completeness while retaining high fidelity.
arXiv Detail & Related papers (2025-09-09T08:17:46Z) - Structural Energy-Guided Sampling for View-Consistent Text-to-3D [18.973527029488746]
Text-to-3D generation often suffers from the Janus problem, where objects collapse into duplicated or distorted geometry from other angles.<n>We propose Structural Energy-Guided Sampling (SEGS), a training-free, plug-and-play framework that enforces multi-view consistency entirely at sampling time.
arXiv Detail & Related papers (2025-08-23T06:26:04Z) - Unleashing Semantic and Geometric Priors for 3D Scene Completion [18.515824341739]
Camera-based 3D semantic scene completion (SSC) provides dense geometric and semantic perception for autonomous driving and robotic navigation.<n>Existing methods rely on a coupled encoder to deliver both semantic and geometric priors.<n>We propose FoundationSSC, a novel framework that performs dual decoupling at both the source and pathway levels.
arXiv Detail & Related papers (2025-08-19T08:10:39Z) - Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets [90.99212668875971]
Step1X-3D is an open framework addressing challenges such as data scarcity, algorithmic limitations, and ecosystem fragmentation.<n>We present a two-stage 3D-native architecture combining a hybrid VAE-DiT geometry generator with a diffusion-based texture synthesis module.<n> Benchmark results demonstrate state-of-the-art performance that exceeds existing open-source methods.
arXiv Detail & Related papers (2025-05-12T16:56:30Z) - Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis [22.767866875051013]
We propose GS-Diff, a novel 3DGS framework guided by a multi-view diffusion model to address limitations of current methods.<n>By generating pseudo-observations conditioned on multi-view inputs, our method transforms under-constrained 3D reconstruction problems into well-posed ones.<n> Experiments on four benchmarks demonstrate that GS-Diff consistently outperforms state-of-the-art baselines by significant margins.
arXiv Detail & Related papers (2025-04-02T17:59:46Z) - EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization.<n>We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z) - GaussRender: Learning 3D Occupancy with Gaussian Rendering [86.89653628311565]
GaussRender is a module that improves 3D occupancy learning by enforcing projective consistency.<n>Our method penalizes 3D configurations that produce inconsistent 2D projections, thereby enforcing a more coherent 3D structure.
arXiv Detail & Related papers (2025-02-07T16:07:51Z) - Geometry-Contrastive Transformer for Generalized 3D Pose Transfer [95.56457218144983]
The intuition of this work is to perceive the geometric inconsistency between the given meshes with the powerful self-attention mechanism.
We propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies.
We present a latent isometric regularization module together with a novel semi-synthesized dataset for the cross-dataset 3D pose transfer task.
arXiv Detail & Related papers (2021-12-14T13:14:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.