Related papers: CAT3D: Create Anything in 3D with Multi-View Diffusion Models

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

URL: http://arxiv.org/abs/2405.10314v1
Date: Thu, 16 May 2024 17:59:05 GMT
Title: CAT3D: Create Anything in 3D with Multi-View Diffusion Models
Authors: Ruiqi Gao, Aleksander Holynski, Philipp Henzler, Arthur Brussee, Ricardo Martin-Brualla, Pratul Srinivasan, Jonathan T. Barron, Ben Poole,
Abstract summary: We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model. CAT3D can create entire 3D scenes in as little as one minute, and outperforms existing methods for single image and few-view 3D scene creation.
Score: 87.80820708758317
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Advances in 3D reconstruction have enabled high-quality 3D capture, but require a user to collect hundreds to thousands of images to create a 3D scene. We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model. Given any number of input images and a set of target novel viewpoints, our model generates highly consistent novel views of a scene. These generated views can be used as input to robust 3D reconstruction techniques to produce 3D representations that can be rendered from any viewpoint in real-time. CAT3D can create entire 3D scenes in as little as one minute, and outperforms existing methods for single image and few-view 3D scene creation. See our project page for results and interactive demos at https://cat3d.github.io .

Related papers

Bolt3D: Generating 3D Scenes in Seconds [77.592919825037]
Given one or more images, our model Bolt3D directly samples a 3D scene representation in less than seven seconds on a single GPU. Compared to prior multiview generative models that require per-scene optimization for 3D reconstruction, Bolt3D reduces the inference cost by a factor of up to 300 times.
arXiv Detail & Related papers (2025-03-18T17:24:19Z)
Sharp-It: A Multi-view to Multi-view Diffusion Model for 3D Synthesis and Manipulation [15.215597253086612]
We bridge the quality gap between methods that directly generate 3D representations and ones that reconstruct 3D objects from multi-view images. We introduce a multi-view to multi-view diffusion model called Sharp-It, which takes a 3D consistent set of multi-view images. We demonstrate that Sharp-It enables various 3D applications, such as fast synthesis, editing, and controlled generation, while attaining high-quality assets.
arXiv Detail & Related papers (2024-12-03T17:58:07Z)
Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion [3.545941891218148]
We investigate whether it is necessary to explicitly enforce multiview consistency over time, as current approaches do, or if it is sufficient for a model to generate 3D representations of each timestep independently. We propose a model, Vid3D, that leverages 2D video diffusion to generate 3D videos by first generating a 2D "seed" of the video's temporal dynamics and then independently generating a 3D representation for each timestep in the seed video.
arXiv Detail & Related papers (2024-06-17T04:09:04Z)
3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [51.64796781728106]
We propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior to 2D diffusion model and the global 3D information of the current scene. Our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
arXiv Detail & Related papers (2024-03-14T14:31:22Z)
Uni3D: Exploring Unified 3D Representation at Scale [66.26710717073372]
We present Uni3D, a 3D foundation model to explore the unified 3D representation at scale. Uni3D uses a 2D ViT end-to-end pretrained to align the 3D point cloud features with the image-text aligned features. We show that the strong Uni3D representation also enables applications such as 3D painting and retrieval in the wild.
arXiv Detail & Related papers (2023-10-10T16:49:21Z)
3D-LLM: Injecting the 3D World into Large Language Models [60.43823088804661]
Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. We propose to inject the 3D world into large language models and introduce a new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks.
arXiv Detail & Related papers (2023-07-24T17:59:02Z)
CC3D: Layout-Conditioned Generation of Compositional 3D Scenes [49.281006972028194]
We introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts. Our evaluations on synthetic 3D-FRONT and real-world KITTI-360 datasets demonstrate that our model generates scenes of improved visual and geometric quality.
arXiv Detail & Related papers (2023-03-21T17:59:02Z)
3inGAN: Learning a 3D Generative Model from Images of a Self-similar Scene [34.2144933185175]
3inGAN is an unconditional 3D generative model trained from 2D images of a single self-similar 3D scene. We show results on semi-stochastic scenes of varying scale and complexity, obtained from real and synthetic sources.
arXiv Detail & Related papers (2022-11-27T18:03:21Z)
GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images [72.15855070133425]
We introduce GET3D, a Generative model that directly generates Explicit Textured 3D meshes with complex topology, rich geometric details, and high-fidelity textures. GET3D is able to generate high-quality 3D textured meshes, ranging from cars, chairs, animals, motorbikes and human characters to buildings.
arXiv Detail & Related papers (2022-09-22T17:16:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.