ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image
Collections
- URL: http://arxiv.org/abs/2306.04619v1
- Date: Wed, 7 Jun 2023 17:47:50 GMT
- Title: ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image
Collections
- Authors: Chun-Han Yao, Amit Raj, Wei-Chih Hung, Yuanzhen Li, Michael
Rubinstein, Ming-Hsuan Yang, Varun Jampani
- Abstract summary: Estimating 3D articulated shapes like animal bodies from monocular images is inherently challenging.
We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild.
We produce realistic animations by fine-tuning the rendered shape and texture under rigid part transformations.
- Score: 71.46546520120162
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimating 3D articulated shapes like animal bodies from monocular images is
inherently challenging due to the ambiguities of camera viewpoint, pose,
texture, lighting, etc. We propose ARTIC3D, a self-supervised framework to
reconstruct per-instance 3D shapes from a sparse image collection in-the-wild.
Specifically, ARTIC3D is built upon a skeleton-based surface representation and
is further guided by 2D diffusion priors from Stable Diffusion. First, we
enhance the input images with occlusions/truncation via 2D diffusion to obtain
cleaner mask estimates and semantic features. Second, we perform
diffusion-guided 3D optimization to estimate shape and texture that are of
high-fidelity and faithful to input images. We also propose a novel technique
to calculate more stable image-level gradients via diffusion models compared to
existing alternatives. Finally, we produce realistic animations by fine-tuning
the rendered shape and texture under rigid part transformations. Extensive
evaluations on multiple existing datasets as well as newly introduced noisy web
image collections with occlusions and truncation demonstrate that ARTIC3D
outputs are more robust to noisy images, higher quality in terms of shape and
texture details, and more realistic when animated. Project page:
https://chhankyao.github.io/artic3d/
Related papers
- The More You See in 2D, the More You Perceive in 3D [32.578628729549145]
SAP3D is a system for 3D reconstruction and novel view synthesis from an arbitrary number of unposed images.
We show that as the number of input images increases, the performance of our approach improves.
arXiv Detail & Related papers (2024-04-04T17:59:40Z) - Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding [16.50466940644004]
We present Isotropic3D, an image-to-3D generation pipeline that takes only an image CLIP embedding as input.
Isotropic3D allows the optimization to be isotropic w.r.t. the azimuth angle by solely resting on the SDS loss.
arXiv Detail & Related papers (2024-03-15T15:27:58Z) - Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model.
Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach.
These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z) - ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models [13.551691697814908]
We present a method that learns to generate multi-view images in a single denoising process from real-world data.
We design an autoregressive generation that renders more 3D-consistent images at any viewpoint.
arXiv Detail & Related papers (2024-03-04T07:57:05Z) - Denoising Diffusion via Image-Based Rendering [54.20828696348574]
We introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes.
First, we introduce a new neural scene representation, IB-planes, that can efficiently and accurately represent large 3D scenes.
Second, we propose a denoising-diffusion framework to learn a prior over this novel 3D scene representation, using only 2D images.
arXiv Detail & Related papers (2024-02-05T19:00:45Z) - 3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with
2D Diffusion Models [102.75875255071246]
3D content creation via text-driven stylization has played a fundamental challenge to multimedia and graphics community.
We propose a new 3DStyle-Diffusion model that triggers fine-grained stylization of 3D meshes with additional controllable appearance and geometric guidance from 2D Diffusion models.
arXiv Detail & Related papers (2023-11-09T15:51:27Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and
Generation [68.06991943974195]
We present RenderDiffusion, the first diffusion model for 3D generation and inference, trained using only monocular 2D supervision.
We evaluate RenderDiffusion on FFHQ, AFHQ, ShapeNet and CLEVR datasets, showing competitive performance for generation of 3D scenes and inference of 3D scenes from 2D images.
arXiv Detail & Related papers (2022-11-17T20:17:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.