FantasyStyle: Controllable Stylized Distillation for 3D Gaussian Splatting
- URL: http://arxiv.org/abs/2508.08136v1
- Date: Mon, 11 Aug 2025 16:11:08 GMT
- Title: FantasyStyle: Controllable Stylized Distillation for 3D Gaussian Splatting
- Authors: Yitong Yang, Yinglin Wang, Changshuo Wang, Huajie Wang, Shuting He,
- Abstract summary: We introduce textbfFantasyStyle, a 3DGS-based style transfer framework, and the first to rely entirely on diffusion model distillation.<n>We enhance cross-view consistency by applying a 3D filter to multi-view noisy latent, selectively reducing low-frequency components to mitigate stylized prior conflicts.<n>Our method consistently outperforms state-of-the-art approaches, achieving higher stylization quality and visual realism across various scenes and styles.
- Score: 7.778588010132252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of 3DGS in generative and editing applications has sparked growing interest in 3DGS-based style transfer. However, current methods still face two major challenges: (1) multi-view inconsistency often leads to style conflicts, resulting in appearance smoothing and distortion; and (2) heavy reliance on VGG features, which struggle to disentangle style and content from style images, often causing content leakage and excessive stylization. To tackle these issues, we introduce \textbf{FantasyStyle}, a 3DGS-based style transfer framework, and the first to rely entirely on diffusion model distillation. It comprises two key components: (1) \textbf{Multi-View Frequency Consistency}. We enhance cross-view consistency by applying a 3D filter to multi-view noisy latent, selectively reducing low-frequency components to mitigate stylized prior conflicts. (2) \textbf{Controllable Stylized Distillation}. To suppress content leakage from style images, we introduce negative guidance to exclude undesired content. In addition, we identify the limitations of Score Distillation Sampling and Delta Denoising Score in 3D style transfer and remove the reconstruction term accordingly. Building on these insights, we propose a controllable stylized distillation that leverages negative guidance to more effectively optimize the 3D Gaussians. Extensive experiments demonstrate that our method consistently outperforms state-of-the-art approaches, achieving higher stylization quality and visual realism across various scenes and styles.
Related papers
- Preference Score Distillation: Leveraging 2D Rewards to Align Text-to-3D Generation with Human Preference [69.34278282513593]
Preference Score Distillation (PSD) is an optimization-based framework for human-aligned text-to-3D synthesis without 3D training data.<n>Our key insight stems from the incompatibility of pixel-level gradients.<n>We introduce an adaptive strategy to co-optimize preference scores and negative text embeddings.
arXiv Detail & Related papers (2026-03-02T08:23:36Z) - DiffStyle3D: Consistent 3D Gaussian Stylization via Attention Optimization [22.652699040654046]
3D style transfer enables the creation of visually expressive 3D content.<n>We propose DiffStyle3D, a novel diffusion-based paradigm for 3DGS style transfer.<n>We show that DiffStyle3D outperforms state-of-the-art methods, achieving higher stylization quality and visual realism.
arXiv Detail & Related papers (2026-01-27T15:41:11Z) - Breaking the Vicious Cycle: Coherent 3D Gaussian Splatting from Sparse and Motion-Blurred Views [40.70901994944635]
We introduce CoherentGS, a framework for high-fidelity 3D reconstruction from sparse and blurry images.<n>Our key insight is to address these compound degradations using a dual-prior strategy.<n>CoherentGS significantly outperforms existing methods, setting a new state-of-the-art for this challenging task.
arXiv Detail & Related papers (2025-12-11T07:36:35Z) - AnchorDS: Anchoring Dynamic Sources for Semantically Consistent Text-to-3D Generation [56.399153019429605]
This work shows that ignoring source dynamics yields inconsistent trajectories that suppress or merge semantic cues.<n>We reformulate text-to-3D optimization as mapping a dynamically evolving source distribution to a fixed target distribution.<n>We introduce AnchorDS, an improved score distillation mechanism that provides state-anchored guidance with image conditions.
arXiv Detail & Related papers (2025-11-12T09:51:23Z) - SSGaussian: Semantic-Aware and Structure-Preserving 3D Style Transfer [57.723850794113055]
We propose a novel 3D style transfer pipeline that integrates prior knowledge from pretrained 2D diffusion models.<n>Our pipeline consists of two key stages: First, we leverage diffusion priors to generate stylized renderings of key viewpoints.<n>The second is instance-level style transfer, which effectively leverages instance-level consistency across stylized key views and transfers it onto the 3D representation.
arXiv Detail & Related papers (2025-09-04T16:40:44Z) - StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians [23.1385740508835]
StyleMe3D is a holistic framework for 3D GS style transfer.<n>It integrates multi-modal style conditioning, multi-level semantic alignment, and perceptual quality enhancement.<n>This work bridges photorealistic 3D GS and artistic stylization, unlocking applications in gaming, virtual worlds, and digital art.
arXiv Detail & Related papers (2025-04-21T17:59:55Z) - ConsDreamer: Advancing Multi-View Consistency for Zero-Shot Text-to-3D Generation [46.64928459085584]
We propose ConsDreamer, a novel framework that mitigates view bias by refining both the conditional and unconditional terms in the score distillation process.<n>We show that ConsDreamer effectively mitigates the multi-face Janus problem in text-to-3D generation, outperforming existing methods in both visual quality and consistency.
arXiv Detail & Related papers (2025-04-03T06:43:23Z) - WaSt-3D: Wasserstein-2 Distance for Scene-to-Scene Stylization on 3D Gaussians [37.139479729087896]
We develop a new style transfer method for 3D scenes called WaSt-3D.
It faithfully transfers details from style scenes to the content scene without requiring any training.
WaSt-3D consistently delivers results across diverse content and style scenes without necessitating any training.
arXiv Detail & Related papers (2024-09-26T15:02:50Z) - ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - StylizedGS: Controllable Stylization for 3D Gaussian Splatting [53.0225128090909]
StylizedGS is an efficient 3D neural style transfer framework with adaptable control over perceptual factors.
Our method achieves high-quality stylization results characterized by faithful brushstrokes and geometric consistency with flexible controls.
arXiv Detail & Related papers (2024-04-08T06:32:11Z) - GaussianStyle: Gaussian Head Avatar via StyleGAN [64.85782838199427]
We propose a novel framework that integrates the volumetric strengths of 3DGS with the powerful implicit representation of StyleGAN.
We show that our method achieves state-of-the-art performance in reenactment, novel view synthesis, and animation.
arXiv Detail & Related papers (2024-02-01T18:14:42Z) - StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image
Synthesis [92.25145204543904]
StyleNeRF is a 3D-aware generative model for high-resolution image synthesis with high multi-view consistency.
It integrates the neural radiance field (NeRF) into a style-based generator.
It can synthesize high-resolution images at interactive rates while preserving 3D consistency at high quality.
arXiv Detail & Related papers (2021-10-18T02:37:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.