3D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing
- URL: http://arxiv.org/abs/2408.13788v1
- Date: Sun, 25 Aug 2024 09:31:22 GMT
- Title: 3D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing
- Authors: Shichao Dong, Ze Yang, Guosheng Lin,
- Abstract summary: We propose a new paradigm to automatically generate 3D labeled training data by harnessing the power of pretrained large foundation models.
For each target semantic class, we first generate 2D images of a single object in various structure and appearance via diffusion models and chatGPT generated text prompts.
We transform these augmented images into 3D objects and construct virtual scenes by random composition.
- Score: 52.68314936128752
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation plays a crucial role in deep learning, enhancing the generalization and robustness of learning-based models. Standard approaches involve simple transformations like rotations and flips for generating extra data. However, these augmentations are limited by their initial dataset, lacking high-level diversity. Recently, large models such as language models and diffusion models have shown exceptional capabilities in perception and content generation. In this work, we propose a new paradigm to automatically generate 3D labeled training data by harnessing the power of pretrained large foundation models. For each target semantic class, we first generate 2D images of a single object in various structure and appearance via diffusion models and chatGPT generated text prompts. Beyond texture augmentation, we propose a method to automatically alter the shape of objects within 2D images. Subsequently, we transform these augmented images into 3D objects and construct virtual scenes by random composition. This method can automatically produce a substantial amount of 3D scene data without the need of real data, providing significant benefits in addressing few-shot learning challenges and mitigating long-tailed class imbalances. By providing a flexible augmentation approach, our work contributes to enhancing 3D data diversity and advancing model capabilities in scene understanding tasks.
Related papers
- Diffusion Models in 3D Vision: A Survey [11.116658321394755]
We review the state-of-the-art approaches that leverage diffusion models for 3D visual tasks.
These approaches include 3D object generation, shape completion, point cloud reconstruction, and scene understanding.
We discuss potential solutions, including improving computational efficiency, enhancing multimodal fusion, and exploring the use of large-scale pretraining.
arXiv Detail & Related papers (2024-10-07T04:12:23Z) - VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models [20.084928490309313]
This paper presents a novel method for building scalable 3D generative models utilizing pre-trained video diffusion models.
By unlocking its multi-view generative capabilities through fine-tuning, we generate a large-scale synthetic multi-view dataset to train a feed-forward 3D generative model.
The proposed model, VFusion3D, trained on nearly 3M synthetic multi-view data, can generate a 3D asset from a single image in seconds.
arXiv Detail & Related papers (2024-03-18T17:59:12Z) - Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation [12.693847842218604]
We introduce a novel 3D customization method, dubbed Make-Your-3D, that can personalize high-fidelity and consistent 3D content within 5 minutes.
Our key insight is to harmonize the distributions of a multi-view diffusion model and an identity-specific 2D generative model, aligning them with the distribution of the desired 3D subject.
Our method can produce high-quality, consistent, and subject-specific 3D content with text-driven modifications that are unseen in subject image.
arXiv Detail & Related papers (2024-03-14T17:57:04Z) - ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models [65.22994156658918]
We present a method that learns to generate multi-view images in a single denoising process from real-world data.
We design an autoregressive generation that renders more 3D-consistent images at any viewpoint.
arXiv Detail & Related papers (2024-03-04T07:57:05Z) - Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability [118.26563926533517]
Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space.
We extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.
arXiv Detail & Related papers (2024-02-19T15:33:09Z) - VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder [56.59814904526965]
This paper introduces a pioneering 3D encoder designed for text-to-3D generation.
A lightweight network is developed to efficiently acquire feature volumes from multi-view images.
The 3D volumes are then trained on a diffusion model for text-to-3D generation using a 3D U-Net.
arXiv Detail & Related papers (2023-12-18T18:59:05Z) - Breathing New Life into 3D Assets with Generative Repainting [74.80184575267106]
Diffusion-based text-to-image models ignited immense attention from the vision community, artists, and content creators.
Recent works have proposed various pipelines powered by the entanglement of diffusion models and neural fields.
We explore the power of pretrained 2D diffusion models and standard 3D neural radiance fields as independent, standalone tools.
Our pipeline accepts any legacy renderable geometry, such as textured or untextured meshes, and orchestrates the interaction between 2D generative refinement and 3D consistency enforcement tools.
arXiv Detail & Related papers (2023-09-15T16:34:51Z) - IT3D: Improved Text-to-3D Generation with Explicit View Synthesis [71.68595192524843]
This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues.
Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images.
For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data.
arXiv Detail & Related papers (2023-08-22T14:39:17Z) - Leveraging 2D Data to Learn Textured 3D Mesh Generation [33.32377849866736]
We present the first generative model of textured 3D meshes.
We train our model to explain a distribution of images by modelling each image as a 3D foreground object.
It learns to generate meshes that when rendered, produce images similar to those in its training set.
arXiv Detail & Related papers (2020-04-08T18:00:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.