Chirpy3D: Creative Fine-grained 3D Object Fabrication via Part Sampling
- URL: http://arxiv.org/abs/2501.04144v2
- Date: Fri, 28 Mar 2025 19:45:00 GMT
- Title: Chirpy3D: Creative Fine-grained 3D Object Fabrication via Part Sampling
- Authors: Kam Woh Ng, Jing Yang, Jia Wei Sii, Jiankang Deng, Chee Seng Chan, Yi-Zhe Song, Tao Xiang, Xiatian Zhu,
- Abstract summary: Chirpy3D is a novel approach for fine-grained 3D object generation in a zero-shot setting.<n>The model must infer plausible 3D structures, capture fine-grained details, and generalize to novel objects.<n>Our experiments demonstrate that Chirpy3D surpasses existing methods in generating creative 3D objects with higher quality and fine-grained details.
- Score: 128.23917788822948
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Chirpy3D, a novel approach for fine-grained 3D object generation, tackling the challenging task of synthesizing creative 3D objects in a zero-shot setting, with access only to unposed 2D images of seen categories. Without structured supervision -- such as camera poses, 3D part annotations, or object-specific labels -- the model must infer plausible 3D structures, capture fine-grained details, and generalize to novel objects using only category-level labels from seen categories. To address this, Chirpy3D introduces a multi-view diffusion model that decomposes training objects into anchor parts in an unsupervised manner, representing the latent space of both seen and unseen parts as continuous distributions. This allows smooth interpolation and flexible recombination of parts to generate entirely new objects with species-specific details. A self-supervised feature consistency loss further ensures structural and semantic coherence. The result is the first system capable of generating entirely novel 3D objects with species-specific fine-grained details through flexible part sampling and composition. Our experiments demonstrate that Chirpy3D surpasses existing methods in generating creative 3D objects with higher quality and fine-grained details. Code will be released at https://github.com/kamwoh/chirpy3d.
Related papers
- PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models [63.1432721793683]
We introduce PartGen, a novel approach that generates 3D objects composed of meaningful parts starting from text, an image, or an unstructured 3D object.<n>We evaluate our method on generated and real 3D assets and show that it outperforms segmentation and part-extraction baselines by a large margin.
arXiv Detail & Related papers (2024-12-24T18:59:43Z) - ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance [76.7746870349809]
We present ComboVerse, a 3D generation framework that produces high-quality 3D assets with complex compositions by learning to combine multiple models.
Our proposed framework emphasizes spatial alignment of objects, compared with standard score distillation sampling.
arXiv Detail & Related papers (2024-03-19T03:39:43Z) - Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model.
Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach.
These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z) - WildFusion: Learning 3D-Aware Latent Diffusion Models in View Space [77.92350895927922]
We propose WildFusion, a new approach to 3D-aware image synthesis based on latent diffusion models (LDMs)
Our 3D-aware LDM is trained without any direct supervision from multiview images or 3D geometry.
This opens up promising research avenues for scalable 3D-aware image synthesis and 3D content creation from in-the-wild image data.
arXiv Detail & Related papers (2023-11-22T18:25:51Z) - Iterative Superquadric Recomposition of 3D Objects from Multiple Views [77.53142165205283]
We propose a framework, ISCO, to recompose an object using 3D superquadrics as semantic parts directly from 2D views.
Our framework iteratively adds new superquadrics wherever the reconstruction error is high.
It provides consistently more accurate 3D reconstructions, even from images in the wild.
arXiv Detail & Related papers (2023-09-05T10:21:37Z) - Creative Birds: Self-Supervised Single-View 3D Style Transfer [23.64817899864608]
We propose a novel method for single-view 3D style transfer that generates a unique 3D object with both shape and texture transfer.
Our focus lies primarily on birds, a popular subject in 3D reconstruction, for which no existing single-view 3D transfer methods have been developed.
arXiv Detail & Related papers (2023-07-26T11:47:44Z) - Anything-3D: Towards Single-view Anything Reconstruction in the Wild [61.090129285205805]
We introduce Anything-3D, a methodical framework that ingeniously combines a series of visual-language models and the Segment-Anything object segmentation model.
Our approach employs a BLIP model to generate textural descriptions, utilize the Segment-Anything model for the effective extraction of objects of interest, and leverages a text-to-image diffusion model to lift object into a neural radiance field.
arXiv Detail & Related papers (2023-04-19T16:39:51Z) - Generative Novel View Synthesis with 3D-Aware Diffusion Models [96.78397108732233]
We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image.
Our method makes use of existing 2D diffusion backbones but, crucially, incorporates geometry priors in the form of a 3D feature volume.
In addition to generating novel views, our method has the ability to autoregressively synthesize 3D-consistent sequences.
arXiv Detail & Related papers (2023-04-05T17:15:47Z) - DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to
the Third Dimension [71.71234436165255]
We contribute DensePose 3D, a method that can learn such reconstructions in a weakly supervised fashion from 2D image annotations only.
Because it does not require 3D scans, DensePose 3D can be used for learning a wide range of articulated categories such as different animal species.
We show significant improvements compared to state-of-the-art non-rigid structure-from-motion baselines on both synthetic and real data on categories of humans and animals.
arXiv Detail & Related papers (2021-08-31T18:33:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.