Related papers: Synthesizing Anyone, Anywhere, in Any Pose

Synthesizing Anyone, Anywhere, in Any Pose

URL: http://arxiv.org/abs/2304.03164v2
Date: Sun, 5 Nov 2023 12:09:54 GMT
Title: Synthesizing Anyone, Anywhere, in Any Pose
Authors: H{\aa}kon Hukkel{\aa}s, Frank Lindseth
Abstract summary: TriA-GAN is a keypoint-guided GAN that can synthesize Anyone, Anywhere, in Any given pose. We show that TriA-GAN significantly improves over previous in-the-wild full-body synthesis methods. We also show that the latent space of TriA-GAN is compatible with standard unconditional editing techniques.
Score: 0.7252027234425334
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We address the task of in-the-wild human figure synthesis, where the primary goal is to synthesize a full body given any region in any image. In-the-wild human figure synthesis has long been a challenging and under-explored task, where current methods struggle to handle extreme poses, occluding objects, and complex backgrounds. Our main contribution is TriA-GAN, a keypoint-guided GAN that can synthesize Anyone, Anywhere, in Any given pose. Key to our method is projected GANs combined with a well-crafted training strategy, where our simple generator architecture can successfully handle the challenges of in-the-wild full-body synthesis. We show that TriA-GAN significantly improves over previous in-the-wild full-body synthesis methods, all while requiring less conditional information for synthesis (keypoints \vs DensePose). Finally, we show that the latent space of TriA-GAN is compatible with standard unconditional editing techniques, enabling text-guided editing of generated human figures.

Related papers

GAS: Generative Avatar Synthesis from a Single Image [54.95198111659466]
We introduce a generalizable and unified framework to synthesize view-consistent and temporally coherent avatars from a single image. Our approach bridges this gap by combining the reconstruction power of regression-based 3D human reconstruction with the generative capabilities of a diffusion model.
arXiv Detail & Related papers (2025-02-10T19:00:39Z)
Towards Affordance-Aware Articulation Synthesis for Rigged Objects [82.08199697616917]
A3Syn synthesizes articulation parameters for arbitrary and open-domain rigged objects obtained from the Internet. A3Syn has stable convergence, completes in minutes, and synthesizes plausible affordance on different combinations of in-the-wild object rigs and scenes.
arXiv Detail & Related papers (2025-01-21T18:59:59Z)
CFSynthesis: Controllable and Free-view 3D Human Video Synthesis [57.561237409603066]
CFSynthesis is a novel framework for generating high-quality human videos with customizable attributes. Our method leverages a texture-SMPL-based representation to ensure consistent and stable character appearances across free viewpoints. Results on multiple datasets show that CFSynthesis achieves state-of-the-art performance in complex human animations.
arXiv Detail & Related papers (2024-12-15T05:57:36Z)
Abstract Art Interpretation Using ControlNet [0.0]
We empower users with finer control over the synthesis process, enabling enhanced manipulation of synthesized imagery. Inspired by the minimalist forms found in abstract artworks, we introduce a novel condition crafted from geometric primitives such as triangles.
arXiv Detail & Related papers (2024-08-23T06:25:54Z)
Survey on Controlable Image Synthesis with Deep Learning [15.29961293132048]
We present a survey of some recent works on 3D controllable image synthesis using deep learning. We first introduce the datasets and evaluation indicators for 3D controllable image synthesis. The photometrically controllable image synthesis approaches are also reviewed for 3D re-lighting researches.
arXiv Detail & Related papers (2023-07-18T07:02:51Z)
Novel View Synthesis of Humans using Differentiable Rendering [50.57718384229912]
We present a new approach for synthesizing novel views of people in new poses. Our synthesis makes use of diffuse Gaussian primitives that represent the underlying skeletal structure of a human. Rendering these primitives gives results in a high-dimensional latent image, which is then transformed into an RGB image by a decoder network.
arXiv Detail & Related papers (2023-03-28T10:48:33Z)
Set-the-Scene: Global-Local Training for Generating Controllable NeRF Scenes [68.14127205949073]
We propose a novel GlobalLocal training framework for synthesizing a 3D scene using object proxies. We show that using proxies allows a wide variety of editing options, such as adjusting the placement of each independent object. Our results show that Set-the-Scene offers a powerful solution for scene synthesis and manipulation.
arXiv Detail & Related papers (2023-03-23T17:17:29Z)
InsetGAN for Full-Body Image Generation [90.71033704904629]
We propose a novel method to combine multiple pretrained GANs. One GAN generates a global canvas (e.g., human body) and a set of specialized GANs, or insets, focus on different parts. We demonstrate the setup by combining a full body GAN with a dedicated high-quality face GAN to produce plausible-looking humans.
arXiv Detail & Related papers (2022-03-14T17:01:46Z)
Human View Synthesis using a Single Sparse RGB-D Input [16.764379184593256]
We present a novel view synthesis framework to generate realistic renders from unseen views of any human captured from a single-view sensor with sparse RGB-D. An enhancer network leverages the overall fidelity, even in occluded areas from the original view, producing crisp renders with fine details.
arXiv Detail & Related papers (2021-12-27T20:13:53Z)
PISE: Person Image Synthesis and Editing with Decoupled GAN [64.70360318367943]
We propose PISE, a novel two-stage generative model for Person Image Synthesis and Editing. For human pose transfer, we first synthesize a human parsing map aligned with the target pose to represent the shape of clothing. To decouple the shape and style of clothing, we propose joint global and local per-region encoding and normalization.
arXiv Detail & Related papers (2021-03-06T04:32:06Z)
Semantic View Synthesis [56.47999473206778]
We tackle a new problem of semantic view synthesis -- generating free-viewpoint rendering of a synthesized scene using a semantic label map as input. First, we focus on synthesizing the color and depth of the visible surface of the 3D scene. We then use the synthesized color and depth to impose explicit constraints on the multiple-plane image (MPI) representation prediction process.
arXiv Detail & Related papers (2020-08-24T17:59:46Z)
Hierarchy Composition GAN for High-fidelity Image Synthesis [57.32311953820988]
This paper presents an innovative Hierarchical Composition GAN (HIC-GAN) HIC-GAN incorporates image synthesis in geometry and appearance domains into an end-to-end trainable network. Experiments on scene text image synthesis, portrait editing and indoor rendering tasks show that the proposed HIC-GAN achieves superior synthesis performance qualitatively and quantitatively.
arXiv Detail & Related papers (2019-05-12T11:11:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.