Prompt2Fashion: An automatically generated fashion dataset
- URL: http://arxiv.org/abs/2409.06442v2
- Date: Thu, 12 Sep 2024 18:22:51 GMT
- Title: Prompt2Fashion: An automatically generated fashion dataset
- Authors: Georgia Argyrou, Angeliki Dimitriou, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou,
- Abstract summary: We leverage generative models to automatically construct a fashion image dataset tailored to various occasions, styles, and body types as instructed by users.
We use different Large Language Models (LLMs) and prompting strategies to offer personalized outfits of high aesthetic quality, detail, and relevance to both expert and non-expert users' requirements.
- Score: 1.8817715864806608
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Despite the rapid evolution and increasing efficacy of language and vision generative models, there remains a lack of comprehensive datasets that bridge the gap between personalized fashion needs and AI-driven design, limiting the potential for truly inclusive and customized fashion solutions. In this work, we leverage generative models to automatically construct a fashion image dataset tailored to various occasions, styles, and body types as instructed by users. We use different Large Language Models (LLMs) and prompting strategies to offer personalized outfits of high aesthetic quality, detail, and relevance to both expert and non-expert users' requirements, as demonstrated by qualitative analysis. Up until now the evaluation of the generated outfits has been conducted by non-expert human subjects. Despite the provided fine-grained insights on the quality and relevance of generation, we extend the discussion on the importance of expert knowledge for the evaluation of artistic AI-generated datasets such as this one. Our dataset is publicly available on GitHub at https://github.com/georgiarg/Prompt2Fashion.
Related papers
- KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities [93.74881034001312]
We conduct a systematic study on the fidelity of entities in text-to-image generation models.
We focus on their ability to generate a wide range of real-world visual entities, such as landmark buildings, aircraft, plants, and animals.
Our findings reveal that even the most advanced text-to-image models often fail to generate entities with accurate visual details.
arXiv Detail & Related papers (2024-10-15T17:50:37Z) - Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms [91.19304518033144]
We aim to align vision models with human aesthetic standards in a retrieval system.
We propose a preference-based reinforcement learning method that fine-tunes the vision models to better align the vision models with human aesthetics.
arXiv Detail & Related papers (2024-06-13T17:59:20Z) - Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond [87.1712108247199]
Our goal is to establish a Unified paradigm for Multi-modal Personalization systems (UniMP)
We develop a generic and personalization generative framework, that can handle a wide range of personalized needs.
Our methodology enhances the capabilities of foundational language models for personalized tasks.
arXiv Detail & Related papers (2024-03-15T20:21:31Z) - User Modeling and User Profiling: A Comprehensive Survey [0.0]
This paper presents a survey of the current state, evolution, and future directions of user modeling and profiling research.
We provide a historical overview, tracing the development from early stereotype models to the latest deep learning techniques.
We also address the critical need for privacy-preserving techniques and the push towards explainability and fairness in user modeling approaches.
arXiv Detail & Related papers (2024-02-15T02:06:06Z) - Quality and Quantity: Unveiling a Million High-Quality Images for Text-to-Image Synthesis in Fashion Design [14.588884182004277]
We present the Fashion-Diffusion dataset, a product of multiple years' rigorous effort.
The dataset comprises over a million high-quality fashion images, paired with detailed text descriptions.
To foster standardization in the T2I-based fashion design field, we propose a new benchmark for evaluating the performance of fashion design models.
arXiv Detail & Related papers (2023-11-19T06:43:11Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - Unsupervised Neural Stylistic Text Generation using Transfer learning
and Adapters [66.17039929803933]
We propose a novel transfer learning framework which updates only $0.3%$ of model parameters to learn style specific attributes for response generation.
We learn style specific attributes from the PERSONALITY-CAPTIONS dataset.
arXiv Detail & Related papers (2022-10-07T00:09:22Z) - FashionVQA: A Domain-Specific Visual Question Answering System [2.6924405243296134]
We train a visual question answering (VQA) system to answer complex natural language questions about apparel in fashion photoshoot images.
The accuracy of the best model surpasses the human expert level, even when answering human-generated questions.
Our approach for generating a large-scale multimodal domain-specific dataset provides a path for training specialized models capable of communicating in natural language.
arXiv Detail & Related papers (2022-08-24T01:18:13Z) - StyleGAN-Human: A Data-Centric Odyssey of Human Generation [96.7080874757475]
This work takes a data-centric perspective and investigates multiple critical aspects in "data engineering"
We collect and annotate a large-scale human image dataset with over 230K samples capturing diverse poses and textures.
We rigorously investigate three essential factors in data engineering for StyleGAN-based human generation, namely data size, data distribution, and data alignment.
arXiv Detail & Related papers (2022-04-25T17:55:08Z) - Aesthetics, Personalization and Recommendation: A survey on Deep
Learning in Fashion [3.202857828083949]
The survey shows remarkable approaches that encroach the subject of achieving that by divulging deep into how visual data can be interpreted and leveraged.
Aesthetics play a vital role in clothing recommendation as users' decision depends largely on whether the clothing is in line with their aesthetics, however the conventional image features cannot portray this directly.
The survey also highlights remarkable models like tensor factorization model, conditional random field model among others to cater the need to acknowledge aesthetics as an important factor in Apparel recommendation.
arXiv Detail & Related papers (2021-01-20T19:57:13Z) - Using Artificial Intelligence to Analyze Fashion Trends [0.76146285961466]
This study proposes a data-driven quantitative abstracting approach using an artificial intelligence (A.I.) algorithm.
An A.I. model was trained on fashion images from a large-scale dataset under different scenarios.
It was found that the A.I. model can generate rich descriptions of detected regions and accurately bind the garments in the images.
arXiv Detail & Related papers (2020-05-03T04:46:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.