Illusion3D: 3D Multiview Illusion with 2D Diffusion Priors
- URL: http://arxiv.org/abs/2412.09625v1
- Date: Thu, 12 Dec 2024 18:59:59 GMT
- Title: Illusion3D: 3D Multiview Illusion with 2D Diffusion Priors
- Authors: Yue Feng, Vaibhav Sanjay, Spencer Lutz, Badour AlBahar, Songwei Ge, Jia-Bin Huang,
- Abstract summary: We present a simple yet effective approach for creating 3D multiview illusions based on user-provided text prompts or images.
Our method leverages a pre-trained text-to-image diffusion model to optimize the textures and geometry of neural 3D representations.
We develop several techniques to improve the quality of the generated 3D multiview illusions.
- Score: 19.58299058678772
- License:
- Abstract: Automatically generating multiview illusions is a compelling challenge, where a single piece of visual content offers distinct interpretations from different viewing perspectives. Traditional methods, such as shadow art and wire art, create interesting 3D illusions but are limited to simple visual outputs (i.e., figure-ground or line drawing), restricting their artistic expressiveness and practical versatility. Recent diffusion-based illusion generation methods can generate more intricate designs but are confined to 2D images. In this work, we present a simple yet effective approach for creating 3D multiview illusions based on user-provided text prompts or images. Our method leverages a pre-trained text-to-image diffusion model to optimize the textures and geometry of neural 3D representations through differentiable rendering. When viewed from multiple angles, this produces different interpretations. We develop several techniques to improve the quality of the generated 3D multiview illusions. We demonstrate the effectiveness of our approach through extensive experiments and showcase illusion generation with diverse 3D forms.
Related papers
- Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors [24.478875248825563]
We propose a novel image editing technique that enables 3D manipulations on single images.
Our method directly leverages powerful image diffusion models trained on a broad spectrum of text-image pairs.
Our method can generate high-quality 3D-aware image edits with large viewpoint transformations and high appearance and shape consistency with the input image.
arXiv Detail & Related papers (2024-03-18T06:18:59Z) - UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures [80.047065473698]
We propose a novel 3D avatar generation approach termed UltrAvatar with enhanced fidelity of geometry, and superior quality of physically based rendering (PBR) textures without unwanted lighting.
We demonstrate the effectiveness and robustness of the proposed method, outperforming the state-of-the-art methods by a large margin in the experiments.
arXiv Detail & Related papers (2024-01-20T01:55:17Z) - Towards 4D Human Video Stylization [56.33756124829298]
We present a first step towards 4D (3D and time) human video stylization, which addresses style transfer, novel view synthesis and human animation.
We leverage Neural Radiance Fields (NeRFs) to represent videos, conducting stylization in the rendered feature space.
Our framework uniquely extends its capabilities to accommodate novel poses and viewpoints, making it a versatile tool for creative human video stylization.
arXiv Detail & Related papers (2023-12-07T08:58:33Z) - Diffusion Illusions: Hiding Images in Plain Sight [37.87050866208039]
Diffusion Illusions is the first comprehensive pipeline designed to automatically generate a wide range of illusions.
We study three types of illusions, each where the prime images are arranged in different ways.
We conduct comprehensive experiments on these illusions and verify the effectiveness of our proposed method.
arXiv Detail & Related papers (2023-12-06T18:59:18Z) - Wired Perspectives: Multi-View Wire Art Embraces Generative AI [89.99145586890103]
We present DreamWire, an AI system enabling everyone to craft MVWA easily.
Users express their vision through text prompts or scribbles, freeing them from intricate 3D wire organisation.
arXiv Detail & Related papers (2023-11-26T21:09:00Z) - Single-Image 3D Human Digitization with Shape-Guided Diffusion [31.99621159464388]
NeRF and its variants typically require videos or images from different viewpoints.
We present an approach to generate a 360-degree view of a person with a consistent, high-resolution appearance from a single input image.
arXiv Detail & Related papers (2023-11-15T18:59:56Z) - Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training [51.632418297156605]
We introduce MixCon3D, a method aiming to sculpt holistic 3D representation in contrastive language-image-3D pre-training.
We develop the 3D object-level representation from complementary perspectives, e.g., multi-view rendered images with the point cloud.
Then, MixCon3D performs language-3D contrastive learning, comprehensively depicting real-world 3D objects and bolstering text alignment.
arXiv Detail & Related papers (2023-11-03T06:05:36Z) - IT3D: Improved Text-to-3D Generation with Explicit View Synthesis [71.68595192524843]
This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues.
Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images.
For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data.
arXiv Detail & Related papers (2023-08-22T14:39:17Z) - Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion
Prior [36.40582157854088]
In this work, we investigate the problem of creating high-fidelity 3D content from only a single image.
We leverage prior knowledge from a well-trained 2D diffusion model to act as 3D-aware supervision for 3D creation.
Our method presents the first attempt to achieve high-quality 3D creation from a single image for general objects and enables various applications such as text-to-3D creation and texture editing.
arXiv Detail & Related papers (2023-03-24T17:54:22Z) - 3D Concept Learning and Reasoning from Multi-View Images [96.3088005719963]
We introduce a new large-scale benchmark for 3D multi-view visual question answering (3DMV-VQA)
This dataset consists of approximately 5k scenes, 600k images, paired with 50k questions.
We propose a novel 3D concept learning and reasoning framework that seamlessly combines neural fields, 2D pre-trained vision-language models, and neural reasoning operators.
arXiv Detail & Related papers (2023-03-20T17:59:49Z) - 3D-GIF: 3D-Controllable Object Generation via Implicit Factorized
Representations [31.095503715696722]
We propose the factorized representations which are view-independent and light-disentangled, and training schemes with randomly sampled light conditions.
We demonstrate the superiority of our method by visualizing factorized representations, re-lighted images, and albedo-textured meshes.
This is the first work that extracts albedo-textured meshes with unposed 2D images without any additional labels or assumptions.
arXiv Detail & Related papers (2022-03-12T15:23:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.