CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation
- URL: http://arxiv.org/abs/2501.09433v1
- Date: Thu, 16 Jan 2025 10:03:15 GMT
- Title: CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation
- Authors: Hwan Heo, Jangyeong Kim, Seongyeong Lee, Jeong A Wi, Junyoung Choi, Sangjun Ahn,
- Abstract summary: CaPa is a carve-and-paint framework that generates high-fidelity 3D assets efficiently.<n>It excels in texture fidelity and geometric stability, establishing a new standard for practical, scalable 3D asset generation.
- Score: 2.544527978847722
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The synthesis of high-quality 3D assets from textual or visual inputs has become a central objective in modern generative modeling. Despite the proliferation of 3D generation algorithms, they frequently grapple with challenges such as multi-view inconsistency, slow generation times, low fidelity, and surface reconstruction problems. While some studies have addressed some of these issues, a comprehensive solution remains elusive. In this paper, we introduce \textbf{CaPa}, a carve-and-paint framework that generates high-fidelity 3D assets efficiently. CaPa employs a two-stage process, decoupling geometry generation from texture synthesis. Initially, a 3D latent diffusion model generates geometry guided by multi-view inputs, ensuring structural consistency across perspectives. Subsequently, leveraging a novel, model-agnostic Spatially Decoupled Attention, the framework synthesizes high-resolution textures (up to 4K) for a given geometry. Furthermore, we propose a 3D-aware occlusion inpainting algorithm that fills untextured regions, resulting in cohesive results across the entire model. This pipeline generates high-quality 3D assets in less than 30 seconds, providing ready-to-use outputs for commercial applications. Experimental results demonstrate that CaPa excels in both texture fidelity and geometric stability, establishing a new standard for practical, scalable 3D asset generation.
Related papers
- RomanTex: Decoupling 3D-aware Rotary Positional Embedded Multi-Attention Network for Texture Synthesis [10.350576861948952]
RomanTex is a multiview-based texture generation framework that integrates a multi-attention network with an underlying 3D representation.
Our method achieves state-of-the-art results in texture quality and consistency.
arXiv Detail & Related papers (2025-03-24T17:56:11Z) - Pandora3D: A Comprehensive Framework for High-Quality 3D Shape and Texture Generation [56.862552362223425]
This report presents a comprehensive framework for generating high-quality 3D shapes and textures from diverse input prompts.
The framework consists of 3D shape generation and texture generation.
This report details the system architecture, experimental results, and potential future directions to improve and expand the framework.
arXiv Detail & Related papers (2025-02-20T04:22:30Z) - GraphicsDreamer: Image to 3D Generation with Physical Consistency [32.26851174969898]
We introduce GraphicsDreamer, a method for creating highly usable 3D meshes from single images.<n>In the geometry fusion stage, we continue to enforce the PBR constraints, ensuring that the generated 3D objects possess reliable texture details.<n>Our method incorporates topology optimization and fast UV unwrapping capabilities, allowing the 3D products to be seamlessly imported into graphics engines.
arXiv Detail & Related papers (2024-12-18T10:01:27Z) - GaussianAnything: Interactive Point Cloud Flow Matching For 3D Object Generation [75.39457097832113]
This paper introduces a novel 3D generation framework, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space.
Our framework employs a Variational Autoencoder with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information.
The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single image inputs.
arXiv Detail & Related papers (2024-11-12T18:59:32Z) - 3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion [86.25111098482537]
We introduce 3DTopia-XL, a scalable native 3D generative model designed to overcome limitations of existing methods.
3DTopia-XL leverages a novel primitive-based 3D representation, PrimX, which encodes detailed shape, albedo, and material field into a compact tensorial format.
On top of the novel representation, we propose a generative framework based on Diffusion Transformer (DiT), which comprises 1) Primitive Patch Compression, 2) and Latent Primitive Diffusion.
We conduct extensive qualitative and quantitative experiments to demonstrate that 3DTopia-XL significantly outperforms existing methods in generating high-
arXiv Detail & Related papers (2024-09-19T17:59:06Z) - Retrieval-Augmented Score Distillation for Text-to-3D Generation [30.57225047257049]
We introduce novel framework for retrieval-based quality enhancement in text-to-3D generation.
We conduct extensive experiments to demonstrate that ReDream exhibits superior quality with increased geometric consistency.
arXiv Detail & Related papers (2024-02-05T12:50:30Z) - Breathing New Life into 3D Assets with Generative Repainting [74.80184575267106]
Diffusion-based text-to-image models ignited immense attention from the vision community, artists, and content creators.
Recent works have proposed various pipelines powered by the entanglement of diffusion models and neural fields.
We explore the power of pretrained 2D diffusion models and standard 3D neural radiance fields as independent, standalone tools.
Our pipeline accepts any legacy renderable geometry, such as textured or untextured meshes, and orchestrates the interaction between 2D generative refinement and 3D consistency enforcement tools.
arXiv Detail & Related papers (2023-09-15T16:34:51Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z) - Efficient Geometry-aware 3D Generative Adversarial Networks [50.68436093869381]
Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent.
In this work, we improve the computational efficiency and image quality of 3D GANs without overly relying on these approximations.
We introduce an expressive hybrid explicit-implicit network architecture that synthesizes not only high-resolution multi-view-consistent images in real time but also produces high-quality 3D geometry.
arXiv Detail & Related papers (2021-12-15T08:01:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.