3D generation on ImageNet
- URL: http://arxiv.org/abs/2303.01416v1
- Date: Thu, 2 Mar 2023 17:06:57 GMT
- Title: 3D generation on ImageNet
- Authors: Ivan Skorokhodov, Aliaksandr Siarohin, Yinghao Xu, Jian Ren, Hsin-Ying
Lee, Peter Wonka, Sergey Tulyakov
- Abstract summary: We develop a 3D generator with Generic Priors (3DGP): a 3D synthesis framework with more general assumptions about the training data.
Our model is based on three new ideas.
We explore our model on four datasets: SDIP Dogs 256x256, SDIP Elephants 256x256, LSUN Horses 256x256, and ImageNet 256x256.
- Score: 76.0440752186121
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing 3D-from-2D generators are typically designed for well-curated
single-category datasets, where all the objects have (approximately) the same
scale, 3D location, and orientation, and the camera always points to the center
of the scene. This makes them inapplicable to diverse, in-the-wild datasets of
non-alignable scenes rendered from arbitrary camera poses. In this work, we
develop a 3D generator with Generic Priors (3DGP): a 3D synthesis framework
with more general assumptions about the training data, and show that it scales
to very challenging datasets, like ImageNet. Our model is based on three new
ideas. First, we incorporate an inaccurate off-the-shelf depth estimator into
3D GAN training via a special depth adaptation module to handle the
imprecision. Then, we create a flexible camera model and a regularization
strategy for it to learn its distribution parameters during training. Finally,
we extend the recent ideas of transferring knowledge from pre-trained
classifiers into GANs for patch-wise trained models by employing a simple
distillation-based technique on top of the discriminator. It achieves more
stable training than the existing methods and speeds up the convergence by at
least 40%. We explore our model on four datasets: SDIP Dogs 256x256, SDIP
Elephants 256x256, LSUN Horses 256x256, and ImageNet 256x256, and demonstrate
that 3DGP outperforms the recent state-of-the-art in terms of both texture and
geometry quality. Code and visualizations:
https://snap-research.github.io/3dgp.
Related papers
- ConDense: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images [47.682942867405224]
ConDense is a framework for 3D pre-training utilizing existing 2D networks and large-scale multi-view datasets.
We propose a novel 2D-3D joint training scheme to extract co-embedded 2D and 3D features in an end-to-end pipeline.
arXiv Detail & Related papers (2024-08-30T05:57:01Z) - Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos [15.532504015622159]
Category-level 3D pose estimation is a fundamentally important problem in computer vision and robotics.
We tackle the problem of learning to estimate the category-level 3D pose only from casually taken object-centric videos.
arXiv Detail & Related papers (2024-07-05T09:43:05Z) - 3D Congealing: 3D-Aware Image Alignment in the Wild [44.254247801001675]
3D Congealing is a problem of 3D-aware alignment for 2D images capturing semantically similar objects.
We introduce a general framework that tackles the task without assuming shape templates, poses, or any camera parameters.
Our framework can be used for various tasks such as correspondence matching, pose estimation, and image editing.
arXiv Detail & Related papers (2024-04-02T17:32:12Z) - Geometry aware 3D generation from in-the-wild images in ImageNet [18.157263188192434]
We propose a method for reconstructing 3D geometry from diverse and unstructured Imagenet dataset without camera pose information.
We use an efficient triplane representation to learn 3D models from 2D images and modify the architecture of the generator backbone based on StyleGAN2.
The trained generator can produce class-conditional 3D models as well as renderings from arbitrary viewpoints.
arXiv Detail & Related papers (2024-01-31T23:06:39Z) - HoloDiffusion: Training a 3D Diffusion Model using 2D Images [71.1144397510333]
We introduce a new diffusion setup that can be trained, end-to-end, with only posed 2D images for supervision.
We show that our diffusion models are scalable, train robustly, and are competitive in terms of sample quality and fidelity to existing approaches for 3D generative modeling.
arXiv Detail & Related papers (2023-03-29T07:35:56Z) - Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable
Categories [80.30216777363057]
We introduce Common Pets in 3D (CoP3D), a collection of crowd-sourced videos showing around 4,200 distinct pets.
At test time, given a small number of video frames of an unseen object, Tracker-NeRF predicts the trajectories of its 3D points and generates new views.
Results on CoP3D reveal significantly better non-rigid new-view synthesis performance than existing baselines.
arXiv Detail & Related papers (2022-11-07T22:42:42Z) - VirtualPose: Learning Generalizable 3D Human Pose Models from Virtual
Data [69.64723752430244]
We introduce VirtualPose, a two-stage learning framework to exploit the hidden "free lunch" specific to this task.
The first stage transforms images to abstract geometry representations (AGR), and then the second maps them to 3D poses.
It addresses the generalization issue from two aspects: (1) the first stage can be trained on diverse 2D datasets to reduce the risk of over-fitting to limited appearance; (2) the second stage can be trained on diverse AGR synthesized from a large number of virtual cameras and poses.
arXiv Detail & Related papers (2022-07-20T14:47:28Z) - EpiGRAF: Rethinking training of 3D GANs [60.38818140637367]
We show that it is possible to obtain a high-resolution 3D generator with SotA image quality by following a completely different route of simply training the model patch-wise.
The resulting model, named EpiGRAF, is an efficient, high-resolution, pure 3D generator.
arXiv Detail & Related papers (2022-06-21T17:08:23Z) - 3D-to-2D Distillation for Indoor Scene Parsing [78.36781565047656]
We present a new approach that enables us to leverage 3D features extracted from large-scale 3D data repository to enhance 2D features extracted from RGB images.
First, we distill 3D knowledge from a pretrained 3D network to supervise a 2D network to learn simulated 3D features from 2D features during the training.
Second, we design a two-stage dimension normalization scheme to calibrate the 2D and 3D features for better integration.
Third, we design a semantic-aware adversarial training model to extend our framework for training with unpaired 3D data.
arXiv Detail & Related papers (2021-04-06T02:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.