Skel3D: Skeleton Guided Novel View Synthesis
- URL: http://arxiv.org/abs/2412.03407v1
- Date: Wed, 04 Dec 2024 15:45:20 GMT
- Title: Skel3D: Skeleton Guided Novel View Synthesis
- Authors: Aron Fóthi, Bence Fazekas, Natabara Máté Gyöngyössy, Kristian Fenech,
- Abstract summary: We present an approach for monocular open-set novel view (NVS) that leverages object skeletons to guide the underlying diffusion model.
Our method outperforms existing state-of-the-art NVS techniques both quantitatively and qualitatively, without relying on explicit 3D representations.
- Score: 0.0
- License:
- Abstract: In this paper, we present an approach for monocular open-set novel view synthesis (NVS) that leverages object skeletons to guide the underlying diffusion model. Building upon a baseline that utilizes a pre-trained 2D image generator, our method takes advantage of the Objaverse dataset, which includes animated objects with bone structures. By introducing a skeleton guide layer following the existing ray conditioning normalization (RCN) layer, our approach enhances pose accuracy and multi-view consistency. The skeleton guide layer provides detailed structural information for the generative model, improving the quality of synthesized views. Experimental results demonstrate that our skeleton-guided method significantly enhances consistency and accuracy across diverse object categories within the Objaverse dataset. Our method outperforms existing state-of-the-art NVS techniques both quantitatively and qualitatively, without relying on explicit 3D representations.
Related papers
- LiftRefine: Progressively Refined View Synthesis from 3D Lifting with Volume-Triplane Representations [21.183524347952762]
We propose a new view synthesis method via a 3D neural field from both single or few-view input images.
Our reconstruction model first lifts one or more input images to the 3D space from a volume as the coarse-scale 3D representation.
Our diffusion model then hallucinates missing details in the rendered images from tri-planes.
arXiv Detail & Related papers (2024-12-19T02:23:55Z) - MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes [35.16430027877207]
MOVIS aims to enhance the structural awareness of the view-conditioned diffusion model for multi-object NVS.
We introduce an auxiliary task requiring the model to simultaneously predict novel view object masks.
To evaluate the plausibility of synthesized images, we propose to assess cross-view consistency and novel view object placement.
arXiv Detail & Related papers (2024-12-16T05:23:45Z) - MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction [4.457326808146675]
This paper investigates the research task of reconstructing the 3D clothed body from a monocular image.
Existing approaches leverage pre-trained SMPL(-X) estimation models or generative models to provide auxiliary information for human reconstruction.
We propose a multi-level geometry learning framework. Technically, we design three key components: skeleton-level enhancement, joint-level augmentation, and wrinkle-level refinement modules.
arXiv Detail & Related papers (2024-12-04T08:06:06Z) - A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision [65.33043028101471]
We introduce a diffusion model for Gaussian Splats, SplatDiffusion, to enable generation of three-dimensional structures from single images.
Existing methods rely on deterministic, feed-forward predictions, which limit their ability to handle the inherent ambiguity of 3D inference from 2D data.
arXiv Detail & Related papers (2024-12-01T00:29:57Z) - Part-aware Shape Generation with Latent 3D Diffusion of Neural Voxel Fields [50.12118098874321]
We introduce a latent 3D diffusion process for neural voxel fields, enabling generation at significantly higher resolutions.
A part-aware shape decoder is introduced to integrate the part codes into the neural voxel fields, guiding the accurate part decomposition.
The results demonstrate the superior generative capabilities of our proposed method in part-aware shape generation, outperforming existing state-of-the-art methods.
arXiv Detail & Related papers (2024-05-02T04:31:17Z) - Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation [66.3814684757376]
This work presents Zero123-6D, the first work to demonstrate the utility of Diffusion Model-based novel-view-synthesizers in enhancing RGB 6D pose estimation at category-level.
The outlined method shows reduction in data requirements, removal of the necessity of depth information in zero-shot category-level 6D pose estimation task, and increased performance, quantitatively demonstrated through experiments on the CO3D dataset.
arXiv Detail & Related papers (2024-03-21T10:38:18Z) - Unsupervised 3D Pose Estimation with Non-Rigid Structure-from-Motion
Modeling [83.76377808476039]
We propose a new modeling method for human pose deformations and design an accompanying diffusion-based motion prior.
Inspired by the field of non-rigid structure-from-motion, we divide the task of reconstructing 3D human skeletons in motion into the estimation of a 3D reference skeleton.
A mixed spatial-temporal NRSfMformer is used to simultaneously estimate the 3D reference skeleton and the skeleton deformation of each frame from 2D observations sequence.
arXiv Detail & Related papers (2023-08-18T16:41:57Z) - Generative Novel View Synthesis with 3D-Aware Diffusion Models [96.78397108732233]
We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image.
Our method makes use of existing 2D diffusion backbones but, crucially, incorporates geometry priors in the form of a 3D feature volume.
In addition to generating novel views, our method has the ability to autoregressively synthesize 3D-consistent sequences.
arXiv Detail & Related papers (2023-04-05T17:15:47Z) - Mix Dimension in Poincar\'{e} Geometry for 3D Skeleton-based Action
Recognition [57.98278794950759]
Graph Convolutional Networks (GCNs) have already demonstrated their powerful ability to model the irregular data.
We present a novel spatial-temporal GCN architecture which is defined via the Poincar'e geometry.
We evaluate our method on two current largest scale 3D datasets.
arXiv Detail & Related papers (2020-07-30T18:23:18Z) - What and Where: Modeling Skeletons from Semantic and Spatial
Perspectives for Action Recognition [46.836815779215456]
We propose to model skeletons from a novel spatial perspective, from which the model takes the spatial location as prior knowledge to group human joints.
From the semantic perspective, we propose a Transformer-like network that is expert in modeling joint correlations.
From the spatial perspective, we transform the skeleton data into the sparse format for efficient feature extraction.
arXiv Detail & Related papers (2020-04-07T10:53:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.