FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization
- URL: http://arxiv.org/abs/2507.13311v1
- Date: Thu, 17 Jul 2025 17:30:29 GMT
- Title: FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization
- Authors: Chuancheng Shi, Yixiang Chen, Burong Lei, Jichao Chen,
- Abstract summary: We introduce FashionPose, the first unified text-to-pose-to-relighting generation framework.<n>By replacing explicit pose annotations with text-driven conditioning, FashionPose enables accurate pose alignment, faithful garment rendering, and flexible lighting control.<n>Experiments demonstrate fine-grained pose synthesis and efficient, consistent relighting, providing a practical solution for personalized virtual fashion display.
- Score: 0.29998889086656577
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Realistic and controllable garment visualization is critical for fashion e-commerce, where users expect personalized previews under diverse poses and lighting conditions. Existing methods often rely on predefined poses, limiting semantic flexibility and illumination adaptability. To address this, we introduce FashionPose, the first unified text-to-pose-to-relighting generation framework. Given a natural language description, our method first predicts a 2D human pose, then employs a diffusion model to generate high-fidelity person images, and finally applies a lightweight relighting module, all guided by the same textual input. By replacing explicit pose annotations with text-driven conditioning, FashionPose enables accurate pose alignment, faithful garment rendering, and flexible lighting control. Experiments demonstrate fine-grained pose synthesis and efficient, consistent relighting, providing a practical solution for personalized virtual fashion display.
Related papers
- One Model For All: Partial Diffusion for Unified Try-On and Try-Off in Any Pose [99.056324701764]
We introduce textbfOMFA (emphOne Model For All), a unified diffusion framework for both virtual try-on and try-off.<n>The framework is entirely mask-free and requires only a single portrait and a target pose as input.<n>It achieves state-of-the-art results on both try-on and try-off tasks, providing a practical and generalizable solution for virtual garment synthesis.
arXiv Detail & Related papers (2025-08-06T15:46:01Z) - Fine-Grained Controllable Apparel Showcase Image Generation via Garment-Centric Outpainting [39.50293003775675]
We propose a novel garment-centric outpainting (GCO) framework based on the latent diffusion model (LDM)<n>The proposed framework aims at customizing a fashion model wearing a given garment via text prompts and facial images.
arXiv Detail & Related papers (2025-03-03T08:30:37Z) - TEDRA: Text-based Editing of Dynamic and Photoreal Actors [59.480513384611804]
TEDRA is the first method allowing text-based edits of an avatar.
We train a model to create a controllable and high-fidelity digital replica of the real actor.
We modify the dynamic avatar based on a provided text prompt.
arXiv Detail & Related papers (2024-08-28T17:59:02Z) - URHand: Universal Relightable Hands [64.25893653236912]
We present URHand, the first universal relightable hand model that generalizes across viewpoints, poses, illuminations, and identities.
Our model allows few-shot personalization using images captured with a mobile phone, and is ready to be photorealistically rendered under novel illuminations.
arXiv Detail & Related papers (2024-01-10T18:59:51Z) - MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion [22.62170098534097]
We propose MagicPose, a diffusion-based model for 2D human pose and facial expression.
By leveraging the prior knowledge of image diffusion models, MagicPose generalizes well to unseen human identities and complex poses.
The proposed model is easy to use and can be considered as a plug-in module/extension to Stable Diffusion.
arXiv Detail & Related papers (2023-11-18T10:22:44Z) - FashionTex: Controllable Virtual Try-on with Text and Texture [29.7855591607239]
We propose a multi-modal interactive setting by combining the advantages of both text and texture for multi-level fashion manipulation.
FashionTex framework can semantically control cloth types and local texture patterns without annotated pairwise training data.
arXiv Detail & Related papers (2023-05-08T04:10:36Z) - LightPainter: Interactive Portrait Relighting with Freehand Scribble [79.95574780974103]
We introduce LightPainter, a scribble-based relighting system that allows users to interactively manipulate portrait lighting effect with ease.
To train the relighting module, we propose a novel scribble simulation procedure to mimic real user scribbles.
We demonstrate high-quality and flexible portrait lighting editing capability with both quantitative and qualitative experiments.
arXiv Detail & Related papers (2023-03-22T23:17:11Z) - Highly Personalized Text Embedding for Image Manipulation by Stable
Diffusion [34.662798793560995]
We present a simple yet highly effective approach to personalization using highly personalized (PerHi) text embedding.
Our method does not require model fine-tuning or identifiers, yet still enables manipulation of background, texture, and motion with just a single image and target text.
arXiv Detail & Related papers (2023-03-15T17:07:45Z) - FICE: Text-Conditioned Fashion Image Editing With Guided GAN Inversion [16.583537785874604]
We propose a novel text-conditioned editing model, called FICE, capable of handling a wide variety of diverse text descriptions.
FICE generates highly realistic fashion images and leads to stronger editing performance than existing competing approaches.
arXiv Detail & Related papers (2023-01-05T15:33:23Z) - HumanDiffusion: a Coarse-to-Fine Alignment Diffusion Framework for
Controllable Text-Driven Person Image Generation [73.3790833537313]
Controllable person image generation promotes a wide range of applications such as digital human interaction and virtual try-on.
We propose HumanDiffusion, a coarse-to-fine alignment diffusion framework, for text-driven person image generation.
arXiv Detail & Related papers (2022-11-11T14:30:34Z) - Pixel Sampling for Style Preserving Face Pose Editing [53.14006941396712]
We present a novel two-stage approach to solve the dilemma, where the task of face pose manipulation is cast into face inpainting.
By selectively sampling pixels from the input face and slightly adjust their relative locations, the face editing result faithfully keeps the identity information as well as the image style unchanged.
With the 3D facial landmarks as guidance, our method is able to manipulate face pose in three degrees of freedom, i.e., yaw, pitch, and roll, resulting in more flexible face pose editing.
arXiv Detail & Related papers (2021-06-14T11:29:29Z) - Generating Person Images with Appearance-aware Pose Stylizer [66.44220388377596]
We present a novel end-to-end framework to generate realistic person images based on given person poses and appearances.
The core of our framework is a novel generator called Appearance-aware Pose Stylizer (APS) which generates human images by coupling the target pose with the conditioned person appearance progressively.
arXiv Detail & Related papers (2020-07-17T15:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.