Beyond Flatland: Pre-training with a Strong 3D Inductive Bias
- URL: http://arxiv.org/abs/2112.00113v1
- Date: Tue, 30 Nov 2021 21:30:24 GMT
- Title: Beyond Flatland: Pre-training with a Strong 3D Inductive Bias
- Authors: Shubhaankar Gupta, Thomas P. O'Connell, Bernhard Egger
- Abstract summary: Kataoka et al., 2020 introduced a technique to eliminate the need for natural images in supervised deep learning.
We take inspiration from their work and build on this idea using 3D procedural object renders.
Similar to the previous work, our training corpus will be fully synthetic and derived from simple procedural strategies.
- Score: 5.577231009305908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-training on large-scale databases consisting of natural images and then
fine-tuning them to fit the application at hand, or transfer-learning, is a
popular strategy in computer vision. However, Kataoka et al., 2020 introduced a
technique to eliminate the need for natural images in supervised deep learning
by proposing a novel synthetic, formula-based method to generate 2D fractals as
training corpus. Using one synthetically generated fractal for each class, they
achieved transfer learning results comparable to models pre-trained on natural
images. In this project, we take inspiration from their work and build on this
idea -- using 3D procedural object renders. Since the image formation process
in the natural world is based on its 3D structure, we expect pre-training with
3D mesh renders to provide an implicit bias leading to better generalization
capabilities in a transfer learning setting and that invariances to 3D rotation
and illumination are easier to be learned based on 3D data. Similar to the
previous work, our training corpus will be fully synthetic and derived from
simple procedural strategies; we will go beyond classic data augmentation and
also vary illumination and pose which are controllable in our setting and study
their effect on transfer learning capabilities in context to prior work. In
addition, we will compare the 2D fractal and 3D procedural object networks to
human and non-human primate brain data to learn more about the 2D vs. 3D nature
of biological vision.
Related papers
- 3D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing [52.68314936128752]
We propose a new paradigm to automatically generate 3D labeled training data by harnessing the power of pretrained large foundation models.
For each target semantic class, we first generate 2D images of a single object in various structure and appearance via diffusion models and chatGPT generated text prompts.
We transform these augmented images into 3D objects and construct virtual scenes by random composition.
arXiv Detail & Related papers (2024-08-25T09:31:22Z) - The More You See in 2D, the More You Perceive in 3D [32.578628729549145]
SAP3D is a system for 3D reconstruction and novel view synthesis from an arbitrary number of unposed images.
We show that as the number of input images increases, the performance of our approach improves.
arXiv Detail & Related papers (2024-04-04T17:59:40Z) - PonderV2: Pave the Way for 3D Foundation Model with A Universal
Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.
For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - AG3D: Learning to Generate 3D Avatars from 2D Image Collections [96.28021214088746]
We propose a new adversarial generative model of realistic 3D people from 2D images.
Our method captures shape and deformation of the body and loose clothing by adopting a holistic 3D generator.
We experimentally find that our method outperforms previous 3D- and articulation-aware methods in terms of geometry and appearance.
arXiv Detail & Related papers (2023-05-03T17:56:24Z) - Farm3D: Learning Articulated 3D Animals by Distilling 2D Diffusion [67.71624118802411]
We present Farm3D, a method for learning category-specific 3D reconstructors for articulated objects.
We propose a framework that uses an image generator, such as Stable Diffusion, to generate synthetic training data.
Our network can be used for analysis, including monocular reconstruction, or for synthesis, generating articulated assets for real-time applications such as video games.
arXiv Detail & Related papers (2023-04-20T17:59:34Z) - Unsupervised Learning of Efficient Geometry-Aware Neural Articulated
Representations [89.1388369229542]
We propose an unsupervised method for 3D geometry-aware representation learning of articulated objects.
We obviate this need by learning the representations with GAN training.
Experiments demonstrate the efficiency of our method and show that GAN-based training enables learning of controllable 3D representations without supervision.
arXiv Detail & Related papers (2022-04-19T12:10:18Z) - Learning from 2D: Pixel-to-Point Knowledge Transfer for 3D Pretraining [21.878815180924832]
We present a novel 3D pretraining method by leveraging 2D networks learned from rich 2D datasets.
Our experiments show that the 3D models pretrained with 2D knowledge boost the performances across various real-world 3D downstream tasks.
arXiv Detail & Related papers (2021-04-10T05:40:42Z) - Learning Neural Light Transport [28.9247002210861]
We present an approach for learning light transport in static and dynamic 3D scenes using a neural network.
We find that our model is able to produce photorealistic renderings of static and dynamic scenes.
arXiv Detail & Related papers (2020-06-05T13:26:05Z) - Leveraging 2D Data to Learn Textured 3D Mesh Generation [33.32377849866736]
We present the first generative model of textured 3D meshes.
We train our model to explain a distribution of images by modelling each image as a 3D foreground object.
It learns to generate meshes that when rendered, produce images similar to those in its training set.
arXiv Detail & Related papers (2020-04-08T18:00:37Z) - Chained Representation Cycling: Learning to Estimate 3D Human Pose and
Shape by Cycling Between Representations [73.11883464562895]
We propose a new architecture that facilitates unsupervised, or lightly supervised, learning.
We demonstrate the method by learning 3D human pose and shape from un-paired and un-annotated images.
While we present results for modeling humans, our formulation is general and can be applied to other vision problems.
arXiv Detail & Related papers (2020-01-06T14:54:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.