Related papers: FIRST: A Million-Entry Dataset for Text-Driven Fashion Synthesis and Design

FIRST: A Million-Entry Dataset for Text-Driven Fashion Synthesis and Design

URL: http://arxiv.org/abs/2311.07414v1
Date: Mon, 13 Nov 2023 15:50:25 GMT
Title: FIRST: A Million-Entry Dataset for Text-Driven Fashion Synthesis and Design
Authors: Zhen Huang, Yihao Li, Dong Pei, Jiapeng Zhou, Xuliang Ning, Jianlin Han, Xiaoguang Han, Xuejun Chen
Abstract summary: We introduce a new dataset comprising a million high-resolution fashion images with rich structured textual(FIRST) descriptions. Experiments on prevalent generative models trained over FISRT show the necessity of FIRST. We invite the community to further develop more intelligent fashion synthesis and design systems.
Score: 10.556799226837535
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-driven fashion synthesis and design is an extremely valuable part of artificial intelligence generative content(AIGC), which has the potential to propel a tremendous revolution in the traditional fashion industry. To advance the research on text-driven fashion synthesis and design, we introduce a new dataset comprising a million high-resolution fashion images with rich structured textual(FIRST) descriptions. In the FIRST, there is a wide range of attire categories and each image-paired textual description is organized at multiple hierarchical levels. Experiments on prevalent generative models trained over FISRT show the necessity of FIRST. We invite the community to further develop more intelligent fashion synthesis and design systems that make fashion design more creative and imaginative based on our dataset. The dataset will be released soon.

Related papers

Rethinking Layered Graphic Design Generation with a Top-Down Approach [76.33538798060326]
Graphic design is crucial for conveying ideas and messages. Designers usually organize their work into objects, backgrounds, and vectorized text layers to simplify editing.<n>With the rise of GenAI methods, an endless supply of high-quality graphic designs in pixel format has become more accessible.<n>Despite this, non-layered designs still inspire human designers, influencing their choices in layouts and text styles, ultimately guiding the creation of layered designs.<n>Motivated by this observation, we propose Accordion, a graphic design generation framework taking the first attempt to convert AI-generated designs into editable layered designs.
arXiv Detail & Related papers (2025-07-08T02:26:08Z)
Learning to Synthesize Compatible Fashion Items Using Semantic Alignment and Collocation Classification: An Outfit Generation Framework [59.09707044733695]
We propose a novel outfit generation framework, i.e., OutfitGAN, with the aim of synthesizing an entire outfit. OutfitGAN includes a semantic alignment module, which is responsible for characterizing the mapping correspondence between the existing fashion items and the synthesized ones. In order to evaluate the performance of our proposed models, we built a large-scale dataset consisting of 20,000 fashion outfits.
arXiv Detail & Related papers (2025-02-05T12:13:53Z)
Dressing the Imagination: A Dataset for AI-Powered Translation of Text into Fashion Outfits and A Novel KAN Adapter for Enhanced Feature Adaptation [2.3010373219231495]
We present FLORA, the first comprehensive dataset containing 4,330 curated pairs of fashion outfits and corresponding textual descriptions. As a second contribution, we introduce KAN Adapters, which leverage Kolmogorov-Arnold Networks (KAN) as adaptive modules. To foster further research and collaboration, we will open-source both the FLORA and our implementation code.
arXiv Detail & Related papers (2024-11-21T07:27:45Z)
FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion [11.646594594565098]
This study introduces a novel generative pipeline designed to transform the fashion design process by employing latent diffusion models. We leverage and enhance state-of-the-art virtual try-on datasets, including Multimodal Dress Code and VITON-HD, by integrating sketch data.
arXiv Detail & Related papers (2024-04-26T14:59:42Z)
Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing [40.70752781891058]
This paper tackles the task of multimodal-conditioned fashion image editing. Our approach aims to generate human-centric fashion images guided by multimodal prompts, including text, human body poses, garment sketches, and fabric textures.
arXiv Detail & Related papers (2024-03-21T20:43:10Z)
FashionReGen: LLM-Empowered Fashion Report Generation [61.84580616045145]
We propose an intelligent Fashion Analyzing and Reporting system based on advanced Large Language Models (LLMs) Specifically, it tries to deliver FashionReGen based on effective catwalk analysis, which is equipped with several key procedures. It also inspires the explorations of more high-level tasks with industrial significance in other domains.
arXiv Detail & Related papers (2024-03-11T12:29:35Z)
DressCode: Autoregressively Sewing and Generating Garments from Text Guidance [61.48120090970027]
DressCode aims to democratize design for novices and offer immense potential in fashion design, virtual try-on, and digital human creation. We first introduce SewingGPT, a GPT-based architecture integrating cross-attention with text-conditioned embedding to generate sewing patterns. We then tailor a pre-trained Stable Diffusion to generate tile-based Physically-based Rendering (PBR) textures for the garments.
arXiv Detail & Related papers (2024-01-29T16:24:21Z)
Hierarchical Fashion Design with Multi-stage Diffusion Models [17.848891542772446]
Cross-modal fashion synthesis and editing offer intelligent support to fashion designers. Current diffusion models demonstrate commendable stability and controllability in image synthesis. We propose HieraFashDiff,a novel fashion design method using the shared multi-stage diffusion model.
arXiv Detail & Related papers (2024-01-15T03:38:57Z)
Quality and Quantity: Unveiling a Million High-Quality Images for Text-to-Image Synthesis in Fashion Design [14.588884182004277]
We present the Fashion-Diffusion dataset, a product of multiple years' rigorous effort. The dataset comprises over a million high-quality fashion images, paired with detailed text descriptions. To foster standardization in the T2I-based fashion design field, we propose a new benchmark for evaluating the performance of fashion design models.
arXiv Detail & Related papers (2023-11-19T06:43:11Z)
Social Media Fashion Knowledge Extraction as Captioning [61.41631195195498]
We study the task of social media fashion knowledge extraction. We transform the fashion knowledges into a natural language caption with a sentence transformation method. Our framework then aims to generate the sentence-based fashion knowledge directly from the social media post.
arXiv Detail & Related papers (2023-09-28T09:07:48Z)
FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning [66.38951790650887]
Multimodal tasks in the fashion domain have significant potential for e-commerce. We propose a novel fashion-specific pre-training framework based on weakly-supervised triplets constructed from fashion image-text pairs. We show the triplet-based tasks are an effective addition to standard multimodal pre-training tasks.
arXiv Detail & Related papers (2022-10-26T21:01:19Z)
ARMANI: Part-level Garment-Text Alignment for Unified Cross-Modal Fashion Design [66.68194916359309]
Cross-modal fashion image synthesis has emerged as one of the most promising directions in the generation domain. MaskCLIP decomposes the garments into semantic parts, ensuring fine-grained and semantically accurate alignment between the visual and text information. ArmANI discretizes an image into uniform tokens based on a learned cross-modal codebook in its first stage and uses a Transformer to model the distribution of image tokens for a real image.
arXiv Detail & Related papers (2022-08-11T03:44:02Z)
Knowledge Enhanced Neural Fashion Trend Forecasting [81.2083786318119]
This work focuses on investigating fine-grained fashion element trends for specific user groups. We first contribute a large-scale fashion trend dataset (FIT) collected from Instagram with extracted time series fashion element records and user information. We propose a Knowledge EnhancedRecurrent Network model (KERN) which takes advantage of the capability of deep recurrent neural networks in modeling time-series data.
arXiv Detail & Related papers (2020-05-07T07:42:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.