SpinMeRound: Consistent Multi-View Identity Generation Using Diffusion Models
- URL: http://arxiv.org/abs/2504.10716v1
- Date: Mon, 14 Apr 2025 21:16:20 GMT
- Title: SpinMeRound: Consistent Multi-View Identity Generation Using Diffusion Models
- Authors: Stathis Galanakis, Alexandros Lattas, Stylianos Moschoglou, Bernhard Kainz, Stefanos Zafeiriou,
- Abstract summary: SpinMeRound is a diffusion-based approach designed to generate consistent and accurate head portraits from novel viewpoints.<n>By leveraging a number of input views alongside an identity embedding, our method effectively synthesizes diverse viewpoints of a subject.
- Score: 80.33151028528563
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite recent progress in diffusion models, generating realistic head portraits from novel viewpoints remains a significant challenge. Most current approaches are constrained to limited angular ranges, predominantly focusing on frontal or near-frontal views. Moreover, although the recent emerging large-scale diffusion models have been proven robust in handling 3D scenes, they underperform on facial data, given their complex structure and the uncanny valley pitfalls. In this paper, we propose SpinMeRound, a diffusion-based approach designed to generate consistent and accurate head portraits from novel viewpoints. By leveraging a number of input views alongside an identity embedding, our method effectively synthesizes diverse viewpoints of a subject whilst robustly maintaining its unique identity features. Through experimentation, we showcase our model's generation capabilities in 360 head synthesis, while beating current state-of-the-art multiview diffusion models.
Related papers
- CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation [59.257513664564996]
We introduce a novel method for generating 360deg panoramas from text prompts or images.
We employ multi-view diffusion models to jointly synthesize the six faces of a cubemap.
Our model allows for fine-grained text control, generates high resolution panorama images and generalizes well beyond its training set.
arXiv Detail & Related papers (2025-01-28T18:59:49Z) - Human Multi-View Synthesis from a Single-View Model:Transferred Body and Face Representations [7.448124739584319]
We propose an innovative framework that leverages transferred body and facial representations for multi-view human synthesis.
Specifically, we use a single-view model pretrained on a large-scale human dataset to develop a multi-view body representation.
Our approach outperforms the current state-of-the-art methods, achieving superior performance in multi-view human synthesis.
arXiv Detail & Related papers (2024-12-04T04:02:17Z) - Identity Preserving 3D Head Stylization with Multiview Score Distillation [7.8340104876025105]
3D head stylization transforms realistic facial features into artistic representations, enhancing user engagement across gaming and virtual reality applications.
This paper addresses these challenges by leveraging the PanoHead model, synthesizing images from a comprehensive 360-degree perspective.
We propose a novel framework that employs negative log-likelihood distillation (LD) to enhance identity preservation and improve stylization quality.
arXiv Detail & Related papers (2024-11-20T18:37:58Z) - Towards High-Fidelity 3D Portrait Generation with Rich Details by Cross-View Prior-Aware Diffusion [63.81544586407943]
Single-image 3D portrait generation methods typically employ 2D diffusion models to provide multi-view knowledge, which is then distilled into 3D representations.
We propose a Hybrid Priors Diffsion model, which explicitly and implicitly incorporates multi-view priors as conditions to enhance the status consistency of the generated multi-view portraits.
Experiments demonstrate that our method can produce 3D portraits with accurate geometry and rich details from a single image.
arXiv Detail & Related papers (2024-11-15T17:19:18Z) - MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image.
Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z) - Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering [16.382098950820822]
We propose Zero-to-Hero, a novel test-time approach that enhances view synthesis by manipulating attention maps.
We modify the self-attention mechanism to integrate information from the source view, reducing shape distortions.
Results demonstrate substantial improvements in fidelity and consistency, validated on a diverse set of out-of-distribution objects.
arXiv Detail & Related papers (2024-05-29T00:58:22Z) - EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion [60.30030562932703]
EpiDiff is a localized interactive multiview diffusion model.
It generates 16 multiview images in just 12 seconds.
It surpasses previous methods in quality evaluation metrics.
arXiv Detail & Related papers (2023-12-11T05:20:52Z) - Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images.
Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations.
Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.