Enhancing High-Resolution 3D Generation through Pixel-wise Gradient
Clipping
- URL: http://arxiv.org/abs/2310.12474v4
- Date: Thu, 18 Jan 2024 05:29:09 GMT
- Title: Enhancing High-Resolution 3D Generation through Pixel-wise Gradient
Clipping
- Authors: Zijie Pan, Jiachen Lu, Xiatian Zhu, Li Zhang
- Abstract summary: High-resolution 3D object generation remains a challenging task due to limited availability of comprehensive annotated training data.
Recent advancements have aimed to overcome this constraint by harnessing image generative models, pretrained on extensive curated web datasets.
We propose an innovative operation termed Pixel-wise Gradient Clipping (PGC) designed for seamless integration into existing 3D generative models.
- Score: 46.364968008574664
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-resolution 3D object generation remains a challenging task primarily due
to the limited availability of comprehensive annotated training data. Recent
advancements have aimed to overcome this constraint by harnessing image
generative models, pretrained on extensive curated web datasets, using
knowledge transfer techniques like Score Distillation Sampling (SDS).
Efficiently addressing the requirements of high-resolution rendering often
necessitates the adoption of latent representation-based models, such as the
Latent Diffusion Model (LDM). In this framework, a significant challenge
arises: To compute gradients for individual image pixels, it is necessary to
backpropagate gradients from the designated latent space through the frozen
components of the image model, such as the VAE encoder used within LDM.
However, this gradient propagation pathway has never been optimized, remaining
uncontrolled during training. We find that the unregulated gradients adversely
affect the 3D model's capacity in acquiring texture-related information from
the image generative model, leading to poor quality appearance synthesis. To
address this overarching challenge, we propose an innovative operation termed
Pixel-wise Gradient Clipping (PGC) designed for seamless integration into
existing 3D generative models, thereby enhancing their synthesis quality.
Specifically, we control the magnitude of stochastic gradients by clipping the
pixel-wise gradients efficiently, while preserving crucial texture-related
gradient directions. Despite this simplicity and minimal extra cost, extensive
experiments demonstrate the efficacy of our PGC in enhancing the performance of
existing 3D generative models for high-resolution object rendering.
Related papers
- TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models [69.0220314849478]
TripoSG is a new paradigm capable of generating high-fidelity 3D meshes with precise correspondence to input images.
The resulting 3D shapes exhibit en- hanced detail due to high-resolution capabilities and demonstrate exceptional fidelity to input im- ages.
To foster progress and innovation in the field of 3D generation, we will make our model publicly available.
arXiv Detail & Related papers (2025-02-10T16:07:54Z) - SuperNeRF-GAN: A Universal 3D-Consistent Super-Resolution Framework for Efficient and Enhanced 3D-Aware Image Synthesis [59.73403876485574]
We propose SuperNeRF-GAN, a universal framework for 3D-consistent super-resolution.
A key highlight of SuperNeRF-GAN is its seamless integration with NeRF-based 3D-aware image synthesis methods.
Experimental results demonstrate the superior efficiency, 3D-consistency, and quality of our approach.
arXiv Detail & Related papers (2025-01-12T10:31:33Z) - Taming Feed-forward Reconstruction Models as Latent Encoders for 3D Generative Models [7.485139478358133]
Recent AI-based 3D content creation has largely evolved along two paths: feed-forward image-to-3D reconstruction approaches and 3D generative models trained with 2D or 3D supervision.
We show that existing feed-forward reconstruction methods can serve as effective latent encoders for training 3D generative models, thereby bridging these two paradigms.
arXiv Detail & Related papers (2024-12-31T21:23:08Z) - Towards Degradation-Robust Reconstruction in Generalizable NeRF [58.33351079982745]
Generalizable Radiance Field (GNeRF) across scenes has been proven to be an effective way to avoid per-scene optimization.
There has been limited research on the robustness of GNeRFs to different types of degradation present in the source images.
arXiv Detail & Related papers (2024-11-18T16:13:47Z) - Direct and Explicit 3D Generation from a Single Image [25.207277983430608]
We introduce a novel framework to directly generate explicit surface geometry and texture using multi-view 2D depth and RGB images.
We incorporate epipolar attention into the latent-to-pixel decoder for pixel-level multi-view consistency.
By back-projecting the generated depth pixels into 3D space, we create a structured 3D representation.
arXiv Detail & Related papers (2024-11-17T03:14:50Z) - IT3D: Improved Text-to-3D Generation with Explicit View Synthesis [71.68595192524843]
This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues.
Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images.
For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data.
arXiv Detail & Related papers (2023-08-22T14:39:17Z) - Learning Versatile 3D Shape Generation with Improved AR Models [91.87115744375052]
Auto-regressive (AR) models have achieved impressive results in 2D image generation by modeling joint distributions in the grid space.
We propose the Improved Auto-regressive Model (ImAM) for 3D shape generation, which applies discrete representation learning based on a latent vector instead of volumetric grids.
arXiv Detail & Related papers (2023-03-26T12:03:18Z) - Flow-based GAN for 3D Point Cloud Generation from a Single Image [16.04710129379503]
We introduce a hybrid explicit-implicit generative modeling scheme, which inherits the flow-based explicit generative models for sampling point clouds with arbitrary resolutions.
We evaluate on the large-scale synthetic dataset ShapeNet, with the experimental results demonstrating the superior performance of the proposed method.
arXiv Detail & Related papers (2022-10-08T17:58:20Z) - AE-NeRF: Auto-Encoding Neural Radiance Fields for 3D-Aware Object
Manipulation [24.65896451569795]
We propose a novel framework for 3D-aware object manipulation, called Auto-aware Neural Radiance Fields (AE-NeRF)
Our model is formulated in an auto-encoder architecture, extracts disentangled 3D attributes such as 3D shape, appearance, and camera pose from an image.
A high-quality image is rendered from the attributes through disentangled generative Neural Radiance Fields (NeRF)
arXiv Detail & Related papers (2022-04-28T11:50:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.