Skel3D: Skeleton Guided Novel View Synthesis
- URL: http://arxiv.org/abs/2412.03407v1
- Date: Wed, 04 Dec 2024 15:45:20 GMT
- Title: Skel3D: Skeleton Guided Novel View Synthesis
- Authors: Aron Fóthi, Bence Fazekas, Natabara Máté Gyöngyössy, Kristian Fenech,
- Abstract summary: We present an approach for monocular open-set novel view (NVS) that leverages object skeletons to guide the underlying diffusion model.<n>Our method outperforms existing state-of-the-art NVS techniques both quantitatively and qualitatively, without relying on explicit 3D representations.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we present an approach for monocular open-set novel view synthesis (NVS) that leverages object skeletons to guide the underlying diffusion model. Building upon a baseline that utilizes a pre-trained 2D image generator, our method takes advantage of the Objaverse dataset, which includes animated objects with bone structures. By introducing a skeleton guide layer following the existing ray conditioning normalization (RCN) layer, our approach enhances pose accuracy and multi-view consistency. The skeleton guide layer provides detailed structural information for the generative model, improving the quality of synthesized views. Experimental results demonstrate that our skeleton-guided method significantly enhances consistency and accuracy across diverse object categories within the Objaverse dataset. Our method outperforms existing state-of-the-art NVS techniques both quantitatively and qualitatively, without relying on explicit 3D representations.
Related papers
- EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization.<n>We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z) - RigGS: Rigging of 3D Gaussians for Modeling Articulated Objects in Videos [50.37136267234771]
RigGS is a new paradigm that leverages 3D Gaussian representation and skeleton-based motion representation to model dynamic objects.
Our method can generate realistic new actions easily for objects and achieve high-quality rendering.
arXiv Detail & Related papers (2025-03-21T03:27:07Z) - RigAnything: Template-Free Autoregressive Rigging for Diverse 3D Assets [47.81216915952291]
We present RigAnything, a novel autoregressive transformer-based model.
It makes 3D assets rig-ready by probabilistically generating joints, skeleton topologies, and assigning skinning weights in a template-free manner.
RigAnything demonstrates state-of-the-art performance across diverse object types, including humanoids, quadrupeds, marine creatures, insects, and many more.
arXiv Detail & Related papers (2025-02-13T18:59:13Z) - LiftRefine: Progressively Refined View Synthesis from 3D Lifting with Volume-Triplane Representations [21.183524347952762]
We propose a new view synthesis method via a 3D neural field from both single or few-view input images.
Our reconstruction model first lifts one or more input images to the 3D space from a volume as the coarse-scale 3D representation.
Our diffusion model then hallucinates missing details in the rendered images from tri-planes.
arXiv Detail & Related papers (2024-12-19T02:23:55Z) - MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction [4.457326808146675]
This paper investigates the research task of reconstructing the 3D clothed body from a monocular image.<n>Existing approaches leverage pre-trained SMPL(-X) estimation models or generative models to provide auxiliary information for human reconstruction.<n>We propose a multi-level geometry learning framework. Technically, we design three key components: skeleton-level enhancement, joint-level augmentation, and wrinkle-level refinement modules.
arXiv Detail & Related papers (2024-12-04T08:06:06Z) - A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision [65.33043028101471]
We introduce a diffusion model for Gaussian Splats, SplatDiffusion, to enable generation of three-dimensional structures from single images.<n>Existing methods rely on deterministic, feed-forward predictions, which limit their ability to handle the inherent ambiguity of 3D inference from 2D data.
arXiv Detail & Related papers (2024-12-01T00:29:57Z) - GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction [52.04103235260539]
We present a diffusion model approach based on Gaussian Splatting representation for 3D object reconstruction from a single view.
The model learns to generate 3D objects represented by sets of GS ellipsoids.
The final reconstructed objects explicitly come with high-quality 3D structure and texture, and can be efficiently rendered in arbitrary views.
arXiv Detail & Related papers (2024-07-05T03:43:08Z) - Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation [66.3814684757376]
This work presents Zero123-6D, the first work to demonstrate the utility of Diffusion Model-based novel-view-synthesizers in enhancing RGB 6D pose estimation at category-level.
The outlined method shows reduction in data requirements, removal of the necessity of depth information in zero-shot category-level 6D pose estimation task, and increased performance, quantitatively demonstrated through experiments on the CO3D dataset.
arXiv Detail & Related papers (2024-03-21T10:38:18Z) - Unsupervised 3D Pose Estimation with Non-Rigid Structure-from-Motion
Modeling [83.76377808476039]
We propose a new modeling method for human pose deformations and design an accompanying diffusion-based motion prior.
Inspired by the field of non-rigid structure-from-motion, we divide the task of reconstructing 3D human skeletons in motion into the estimation of a 3D reference skeleton.
A mixed spatial-temporal NRSfMformer is used to simultaneously estimate the 3D reference skeleton and the skeleton deformation of each frame from 2D observations sequence.
arXiv Detail & Related papers (2023-08-18T16:41:57Z) - Generative Novel View Synthesis with 3D-Aware Diffusion Models [96.78397108732233]
We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image.
Our method makes use of existing 2D diffusion backbones but, crucially, incorporates geometry priors in the form of a 3D feature volume.
In addition to generating novel views, our method has the ability to autoregressively synthesize 3D-consistent sequences.
arXiv Detail & Related papers (2023-04-05T17:15:47Z) - Dynamical Deep Generative Latent Modeling of 3D Skeletal Motion [15.359134407309726]
Our model decomposes highly correlated skeleton data into a set of few spatial basis of switching temporal processes.
This results in a dynamical deep generative latent model that parses the meaningful intrinsic states in the dynamics of 3D pose data.
arXiv Detail & Related papers (2021-06-18T23:58:49Z) - Mix Dimension in Poincar\'{e} Geometry for 3D Skeleton-based Action
Recognition [57.98278794950759]
Graph Convolutional Networks (GCNs) have already demonstrated their powerful ability to model the irregular data.
We present a novel spatial-temporal GCN architecture which is defined via the Poincar'e geometry.
We evaluate our method on two current largest scale 3D datasets.
arXiv Detail & Related papers (2020-07-30T18:23:18Z) - What and Where: Modeling Skeletons from Semantic and Spatial
Perspectives for Action Recognition [46.836815779215456]
We propose to model skeletons from a novel spatial perspective, from which the model takes the spatial location as prior knowledge to group human joints.
From the semantic perspective, we propose a Transformer-like network that is expert in modeling joint correlations.
From the spatial perspective, we transform the skeleton data into the sparse format for efficient feature extraction.
arXiv Detail & Related papers (2020-04-07T10:53:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.