Related papers: Automatic Generation of Fashion Images using Prompting in Generative Machine Learning Models

Automatic Generation of Fashion Images using Prompting in Generative Machine Learning Models

URL: http://arxiv.org/abs/2407.14944v1
Date: Sat, 20 Jul 2024 17:37:51 GMT
Title: Automatic Generation of Fashion Images using Prompting in Generative Machine Learning Models
Authors: Georgia Argyrou, Angeliki Dimitriou, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou,
Abstract summary: This work investigates methodologies for generating tailored fashion descriptions using two distinct Large Language Models and a Stable Diffusion model for fashion image creation. Emphasizing adaptability in AI-driven fashion creativity, we focus on prompting techniques, such as zero-shot and few-shot learning. Evaluation combines quantitative metrics such as CLIPscore with qualitative human judgment, highlighting strengths in creativity, coherence, and aesthetic appeal across diverse styles.
Score: 1.8817715864806608
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The advent of artificial intelligence has contributed in a groundbreaking transformation of the fashion industry, redefining creativity and innovation in unprecedented ways. This work investigates methodologies for generating tailored fashion descriptions using two distinct Large Language Models and a Stable Diffusion model for fashion image creation. Emphasizing adaptability in AI-driven fashion creativity, we depart from traditional approaches and focus on prompting techniques, such as zero-shot and few-shot learning, as well as Chain-of-Thought (CoT), which results in a variety of colors and textures, enhancing the diversity of the outputs. Central to our methodology is Retrieval-Augmented Generation (RAG), enriching models with insights from fashion sources to ensure contemporary representations. Evaluation combines quantitative metrics such as CLIPscore with qualitative human judgment, highlighting strengths in creativity, coherence, and aesthetic appeal across diverse styles. Among the participants, RAG and few-shot learning techniques are preferred for their ability to produce more relevant and appealing fashion descriptions. Our code is provided at https://github.com/georgiarg/AutoFashion.

Related papers

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset [140.1967962502411]
We introduce a novel approach that employs a diffusion transformer to generate semantically rich CLIP image features.<n>A sequential pretraining strategy for unified models-first training on image understanding and subsequently on image generation offers practical advantages.<n>Building on our innovative model design, training recipe, and datasets, we develop BLIP3-o, a suite of state-of-the-art unified multimodal models.
arXiv Detail & Related papers (2025-05-14T17:11:07Z)
A Critical Assessment of Modern Generative Models' Ability to Replicate Artistic Styles [0.0]
This paper presents a critical assessment of the style replication capabilities of contemporary generative models. We examine how effectively these models reproduce traditional artistic styles while maintaining structural integrity and compositional balance. The analysis is based on a new large dataset of AI-generated works imitating artistic styles of the past.
arXiv Detail & Related papers (2025-02-21T07:00:06Z)
Integrating Domain Knowledge into Large Language Models for Enhanced Fashion Recommendations [5.251304651964696]
We introduce the Fashion Large Language Model (FLLM), which employs auto-prompt generation training strategies to enhance its capacity for delivering personalized fashion advice. Our results show that this approach surpasses existing models in accuracy, interpretability, and few-shot learning capabilities.
arXiv Detail & Related papers (2025-01-03T21:49:44Z)
IntroStyle: Training-Free Introspective Style Attribution using Diffusion Features [89.95303251220734]
We present a training-free framework to solve the style attribution problem, using the features produced by a diffusion model alone. This is denoted as introspective style attribution (IntroStyle) and demonstrates superior performance to state-of-the-art models for style retrieval. We also introduce a synthetic dataset of Style Hacks (SHacks) to isolate artistic style and evaluate fine-grained style attribution performance.
arXiv Detail & Related papers (2024-12-19T01:21:23Z)
Using Multimodal Foundation Models and Clustering for Improved Style Ambiguity Loss [0.0]
We explore a new form of the style ambiguity training objective, used to approximate creativity, that does not require training a classifier or even a labeled dataset. We find our new methods improve upon the traditional method, based on automated metrics for human judgment, while still maintaining creativity and novelty.
arXiv Detail & Related papers (2024-06-20T15:43:13Z)
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms [91.19304518033144]
We aim to align vision models with human aesthetic standards in a retrieval system. We propose a preference-based reinforcement learning method that fine-tunes the vision models to better align the vision models with human aesthetics.
arXiv Detail & Related papers (2024-06-13T17:59:20Z)
FashionSD-X: Multimodal Fashion Garment Synthesis using Latent Diffusion [11.646594594565098]
This study introduces a novel generative pipeline designed to transform the fashion design process by employing latent diffusion models. We leverage and enhance state-of-the-art virtual try-on datasets, including Multimodal Dress Code and VITON-HD, by integrating sketch data.
arXiv Detail & Related papers (2024-04-26T14:59:42Z)
CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion [74.44273919041912]
Large-scale text-to-image generative models have made impressive strides, showcasing their ability to synthesize a vast array of high-quality images. However, adapting these models for artistic image editing presents two significant challenges. We build the innovative unified framework Creative Synth, which is based on a diffusion model with the ability to coordinate multimodal inputs.
arXiv Detail & Related papers (2024-01-25T10:42:09Z)
HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced Diffusion Models [84.12784265734238]
The goal of Arbitrary Style Transfer (AST) is injecting the artistic features of a style reference into a given image/video. We propose HiCAST, which is capable of explicitly customizing the stylization results according to various source of semantic clues. A novel learning objective is leveraged for video diffusion model training, which significantly improve cross-frame temporal consistency.
arXiv Detail & Related papers (2024-01-11T12:26:23Z)
Generative AI Model for Artistic Style Transfer Using Convolutional Neural Networks [0.0]
Artistic style transfer involves fusing the content of one image with the artistic style of another to create unique visual compositions. This paper presents a comprehensive overview of a novel technique for style transfer using Convolutional Neural Networks (CNNs)
arXiv Detail & Related papers (2023-10-27T16:21:17Z)
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model [93.8067369210696]
Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions. Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps. In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models.
arXiv Detail & Related papers (2023-09-02T03:27:20Z)
UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC) UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z)
Interactive Fashion Content Generation Using LLMs and Latent Diffusion Models [0.0]
Fashionable image generation aims to synthesize images of diverse fashion prevalent around the globe. We propose a method exploiting the equivalence between diffusion models and energy-based models (EBMs) Our results indicate that using an LLM to refine the prompts to the latent diffusion model assists in generating globally creative and culturally diversified fashion styles.
arXiv Detail & Related papers (2023-05-15T18:38:25Z)
FashionSAP: Symbols and Attributes Prompt for Fine-grained Fashion Vision-Language Pre-training [12.652002299515864]
We propose a method for fine-grained fashion vision-language pre-training based on fashion Symbols and Attributes Prompt (FashionSAP) Firstly, we propose the fashion symbols, a novel abstract fashion concept layer, to represent different fashion items. Secondly, the attributes prompt method is proposed to make the model learn specific attributes of fashion items explicitly.
arXiv Detail & Related papers (2023-04-11T08:20:17Z)
A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive Learning [84.8813842101747]
Unified Contrastive Arbitrary Style Transfer (UCAST) is a novel style representation learning and transfer framework. We present an adaptive contrastive learning scheme for style transfer by introducing an input-dependent temperature. Our framework consists of three key components, i.e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer.
arXiv Detail & Related papers (2023-03-09T04:35:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.