Virtual Fashion Photo-Shoots: Building a Large-Scale Garment-Lookbook Dataset
- URL: http://arxiv.org/abs/2510.00633v1
- Date: Wed, 01 Oct 2025 08:05:05 GMT
- Title: Virtual Fashion Photo-Shoots: Building a Large-Scale Garment-Lookbook Dataset
- Authors: Yannick Hauri, Luca A. Lanzendörfer, Till Aczel,
- Abstract summary: We introduce the task of virtual fashion photo-shoot, which seeks to transform standardized garment images into contextually grounded editorial imagery.<n>We construct the first large-scale dataset of garment-lookbook pairs, bridging the gap between e-commerce and fashion media.<n>This dataset offers a foundation for models that move beyond catalog-style generation and toward fashion imagery that reflects creativity, atmosphere, and storytelling.
- Score: 6.155604731137829
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fashion image generation has so far focused on narrow tasks such as virtual try-on, where garments appear in clean studio environments. In contrast, editorial fashion presents garments through dynamic poses, diverse locations, and carefully crafted visual narratives. We introduce the task of virtual fashion photo-shoot, which seeks to capture this richness by transforming standardized garment images into contextually grounded editorial imagery. To enable this new direction, we construct the first large-scale dataset of garment-lookbook pairs, bridging the gap between e-commerce and fashion media. Because such pairs are not readily available, we design an automated retrieval pipeline that aligns garments across domains, combining visual-language reasoning with object-level localization. We construct a dataset with three garment-lookbook pair accuracy levels: high quality (10,000 pairs), medium quality (50,000 pairs), and low quality (300,000 pairs). This dataset offers a foundation for models that move beyond catalog-style generation and toward fashion imagery that reflects creativity, atmosphere, and storytelling.
Related papers
- Learning to Synthesize Compatible Fashion Items Using Semantic Alignment and Collocation Classification: An Outfit Generation Framework [59.09707044733695]
We propose a novel outfit generation framework, i.e., OutfitGAN, with the aim of synthesizing an entire outfit.<n> OutfitGAN includes a semantic alignment module, which is responsible for characterizing the mapping correspondence between the existing fashion items and the synthesized ones.<n>In order to evaluate the performance of our proposed models, we built a large-scale dataset consisting of 20,000 fashion outfits.
arXiv Detail & Related papers (2025-02-05T12:13:53Z) - IMAGDressing-v1: Customizable Virtual Dressing [58.44155202253754]
IMAGDressing-v1 is a virtual dressing task that generates freely editable human images with fixed garments and optional conditions.
IMAGDressing-v1 incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE.
We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet.
arXiv Detail & Related papers (2024-07-17T16:26:30Z) - Quality and Quantity: Unveiling a Million High-Quality Images for Text-to-Image Synthesis in Fashion Design [14.588884182004277]
We present the Fashion-Diffusion dataset, a product of multiple years' rigorous effort.
The dataset comprises over a million high-quality fashion images, paired with detailed text descriptions.
To foster standardization in the T2I-based fashion design field, we propose a new benchmark for evaluating the performance of fashion design models.
arXiv Detail & Related papers (2023-11-19T06:43:11Z) - Multimodal Garment Designer: Human-Centric Latent Diffusion Models for
Fashion Image Editing [40.70752781891058]
We propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images.
We tackle this problem by proposing a new architecture based on latent diffusion models.
Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets.
arXiv Detail & Related papers (2023-04-04T18:03:04Z) - FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified
Retrieval and Captioning [66.38951790650887]
Multimodal tasks in the fashion domain have significant potential for e-commerce.
We propose a novel fashion-specific pre-training framework based on weakly-supervised triplets constructed from fashion image-text pairs.
We show the triplet-based tasks are an effective addition to standard multimodal pre-training tasks.
arXiv Detail & Related papers (2022-10-26T21:01:19Z) - Dressing in the Wild by Watching Dance Videos [69.7692630502019]
This paper attends to virtual try-on in real-world scenes and brings improvements in authenticity and naturalness.
We propose a novel generative network called wFlow that can effectively push up garment transfer to in-the-wild context.
arXiv Detail & Related papers (2022-03-29T08:05:45Z) - Per Garment Capture and Synthesis for Real-time Virtual Try-on [15.128477359632262]
Existing image-based works try to synthesize a try-on image from a single image of a target garment.
It is difficult to reproduce the change of wrinkles caused by pose and body size change, as well as pulling and stretching of the garment by hand.
We propose an alternative per garment capture and synthesis workflow to handle such rich interactions by training the model with many systematically captured images.
arXiv Detail & Related papers (2021-09-10T03:49:37Z) - From Culture to Clothing: Discovering the World Events Behind A Century
of Fashion Images [100.20851232528925]
We propose a data-driven approach to identify specific cultural factors affecting the clothes people wear.
Our work is a first step towards a computational, scalable, and easily refreshable approach to link culture to clothing.
arXiv Detail & Related papers (2021-02-02T18:58:21Z) - Deep Fashion3D: A Dataset and Benchmark for 3D Garment Reconstruction
from Single Images [50.34202789543989]
Deep Fashion3D is the largest collection to date of 3D garment models.
It provides rich annotations including 3D feature lines, 3D body pose and the corresponded multi-view real images.
A novel adaptable template is proposed to enable the learning of all types of clothing in a single network.
arXiv Detail & Related papers (2020-03-28T09:20:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.