Dressing the Imagination: A Dataset for AI-Powered Translation of Text into Fashion Outfits and A Novel KAN Adapter for Enhanced Feature Adaptation
- URL: http://arxiv.org/abs/2411.13901v1
- Date: Thu, 21 Nov 2024 07:27:45 GMT
- Title: Dressing the Imagination: A Dataset for AI-Powered Translation of Text into Fashion Outfits and A Novel KAN Adapter for Enhanced Feature Adaptation
- Authors: Gayatri Deshmukh, Somsubhra De, Chirag Sehgal, Jishu Sen Gupta, Sparsh Mittal,
- Abstract summary: We present FLORA, the first comprehensive dataset containing 4,330 curated pairs of fashion outfits and corresponding textual descriptions.
As a second contribution, we introduce KAN Adapters, which leverage Kolmogorov-Arnold Networks (KAN) as adaptive modules.
To foster further research and collaboration, we will open-source both the FLORA and our implementation code.
- Score: 2.3010373219231495
- License:
- Abstract: Specialized datasets that capture the fashion industry's rich language and styling elements can boost progress in AI-driven fashion design. We present FLORA (Fashion Language Outfit Representation for Apparel Generation), the first comprehensive dataset containing 4,330 curated pairs of fashion outfits and corresponding textual descriptions. Each description utilizes industry-specific terminology and jargon commonly used by professional fashion designers, providing precise and detailed insights into the outfits. Hence, the dataset captures the delicate features and subtle stylistic elements necessary to create high-fidelity fashion designs. We demonstrate that fine-tuning generative models on the FLORA dataset significantly enhances their capability to generate accurate and stylistically rich images from textual descriptions of fashion sketches. FLORA will catalyze the creation of advanced AI models capable of comprehending and producing subtle, stylistically rich fashion designs. It will also help fashion designers and end-users to bring their ideas to life. As a second orthogonal contribution, we introduce KAN Adapters, which leverage Kolmogorov-Arnold Networks (KAN) as adaptive modules. They serve as replacements for traditional MLP-based LoRA adapters. With learnable spline-based activations, KAN Adapters excel in modeling complex, non-linear relationships, achieving superior fidelity, faster convergence and semantic alignment. Extensive experiments and ablation studies on our proposed FLORA dataset validate the superiority of KAN Adapters over LoRA adapters. To foster further research and collaboration, we will open-source both the FLORA and our implementation code.
Related papers
- Learning to Synthesize Compatible Fashion Items Using Semantic Alignment and Collocation Classification: An Outfit Generation Framework [59.09707044733695]
We propose a novel outfit generation framework, i.e., OutfitGAN, with the aim of synthesizing an entire outfit.
OutfitGAN includes a semantic alignment module, which is responsible for characterizing the mapping correspondence between the existing fashion items and the synthesized ones.
In order to evaluate the performance of our proposed models, we built a large-scale dataset consisting of 20,000 fashion outfits.
arXiv Detail & Related papers (2025-02-05T12:13:53Z) - Towards Intelligent Design: A Self-driven Framework for Collocated Clothing Synthesis Leveraging Fashion Styles and Textures [17.35328594773488]
Collocated clothing synthesis (CCS) has emerged as a pivotal topic in fashion technology.
Previous investigations have relied on using paired outfits, such as a pair of matching upper and lower clothing, to train a generative model for achieving this task.
We introduce a new self-driven framework, named style- and texture-guided generative network (ST-Net), to synthesize collocated clothing without the necessity for paired outfits.
arXiv Detail & Related papers (2025-01-23T05:46:08Z) - Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference [4.667044856219814]
This paper presents a novel framework that harnesses the expressive power of large language models (LLMs) for personalized outfit recommendations.
We bridge the item visual-textual gap in items descriptions by employing image captioning with a Multimodal Large Language Model (MLLM)
The framework is evaluated on the Polyvore dataset, demonstrating its effectiveness in two key tasks: fill-in-the-blank, and complementary item retrieval.
arXiv Detail & Related papers (2024-09-18T17:15:06Z) - ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model [73.95608242322949]
Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images.
We present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion to address challenges such as misinterpreted styles and inconsistent semantics.
arXiv Detail & Related papers (2024-05-24T07:19:40Z) - FashionReGen: LLM-Empowered Fashion Report Generation [61.84580616045145]
We propose an intelligent Fashion Analyzing and Reporting system based on advanced Large Language Models (LLMs)
Specifically, it tries to deliver FashionReGen based on effective catwalk analysis, which is equipped with several key procedures.
It also inspires the explorations of more high-level tasks with industrial significance in other domains.
arXiv Detail & Related papers (2024-03-11T12:29:35Z) - HieraFashDiff: Hierarchical Fashion Design with Multi-stage Diffusion Models [17.74292177764933]
We propose a novel hierarchical diffusion-based framework tailored for fashion design, coined as HieraFashDiff.
Our model is designed to mimic the practical fashion design workflow, by unraveling the denosing process into two successive stages.
Our model supports fashion design generation and fine-grained local editing in a single framework.
arXiv Detail & Related papers (2024-01-15T03:38:57Z) - HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced
Diffusion Models [84.12784265734238]
The goal of Arbitrary Style Transfer (AST) is injecting the artistic features of a style reference into a given image/video.
We propose HiCAST, which is capable of explicitly customizing the stylization results according to various source of semantic clues.
A novel learning objective is leveraged for video diffusion model training, which significantly improve cross-frame temporal consistency.
arXiv Detail & Related papers (2024-01-11T12:26:23Z) - FIRST: A Million-Entry Dataset for Text-Driven Fashion Synthesis and
Design [10.556799226837535]
We introduce a new dataset comprising a million high-resolution fashion images with rich structured textual(FIRST) descriptions.
Experiments on prevalent generative models trained over FISRT show the necessity of FIRST.
We invite the community to further develop more intelligent fashion synthesis and design systems.
arXiv Detail & Related papers (2023-11-13T15:50:25Z) - Fashionformer: A simple, Effective and Unified Baseline for Human
Fashion Segmentation and Recognition [80.74495836502919]
In this work, we focus on joint human fashion segmentation and attribute recognition.
We introduce the object query for segmentation and the attribute query for attribute prediction.
For attribute stream, we design a novel Multi-Layer Rendering module to explore more fine-grained features.
arXiv Detail & Related papers (2022-04-10T11:11:10Z) - Learning Diverse Fashion Collocation by Neural Graph Filtering [78.9188246136867]
We propose a novel fashion collocation framework, Neural Graph Filtering, that models a flexible set of fashion items via a graph neural network.
By applying symmetric operations on the edge vectors, this framework allows varying numbers of inputs/outputs and is invariant to their ordering.
We evaluate the proposed approach on three popular benchmarks, the Polyvore dataset, the Polyvore-D dataset, and our reorganized Amazon Fashion dataset.
arXiv Detail & Related papers (2020-03-11T16:17:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.