Fashionability-Enhancing Outfit Image Editing with Conditional Diffusion Models
- URL: http://arxiv.org/abs/2412.18421v1
- Date: Tue, 24 Dec 2024 13:27:25 GMT
- Title: Fashionability-Enhancing Outfit Image Editing with Conditional Diffusion Models
- Authors: Qice Qin, Yuki Hirakawa, Ryotaro Shimizu, Takuya Furusawa, Edgar Simo-Serra,
- Abstract summary: This paper presents a novel diffusion model-based approach that generates fashion images with improved fashionability while maintaining control over key attributes.
Key components of our method include: 1) fashionability enhancement, which ensures that the generated images are more fashionable than the input; 2) preservation of body characteristics, encouraging the generated images to maintain the original shape and proportions of the input; and 3) automatic fashion optimization, which does not rely on manual input or external prompts.
- Score: 8.632093933229678
- License:
- Abstract: Image generation in the fashion domain has predominantly focused on preserving body characteristics or following input prompts, but little attention has been paid to improving the inherent fashionability of the output images. This paper presents a novel diffusion model-based approach that generates fashion images with improved fashionability while maintaining control over key attributes. Key components of our method include: 1) fashionability enhancement, which ensures that the generated images are more fashionable than the input; 2) preservation of body characteristics, encouraging the generated images to maintain the original shape and proportions of the input; and 3) automatic fashion optimization, which does not rely on manual input or external prompts. We also employ two methods to collect training data for guidance while generating and evaluating the images. In particular, we rate outfit images using fashionability scores annotated by multiple fashion experts through OpenSkill-based and five critical aspect-based pairwise comparisons. These methods provide complementary perspectives for assessing and improving the fashionability of the generated images. The experimental results show that our approach outperforms the baseline Fashion++ in generating images with superior fashionability, demonstrating its effectiveness in producing more stylish and appealing fashion images.
Related papers
- Personalized Fashion Recommendation with Image Attributes and Aesthetics Assessment [15.423307815155534]
We aim to provide more accurate personalized fashion recommendations by converting available information, especially images, into two graphs attribute.
Compared with previous methods that separate image and text as two components, the proposed method combines image and text information to create a richer attributes graph.
Preliminary experiments on the IQON3000 dataset have shown that the proposed method achieves competitive accuracy compared with baselines.
arXiv Detail & Related papers (2025-01-06T15:31:10Z) - Illustrious: an Open Advanced Illustration Model [7.428509329724737]
We develop a text-to-image anime image generative model, called Illustrious, to achieve high resolution, dynamic color range images, and high restoration ability.
We focus on three critical approaches for model improvement. First, we delve into the significance of the batch size and dropout control, which enables faster learning of controllable token based concept activations.
Second, we increase the training resolution of images, affecting the accurate depiction of character anatomy in much higher resolution, extending its generation capability over 20MP with proper methods.
arXiv Detail & Related papers (2024-09-30T04:59:12Z) - FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process [120.91393949012014]
FreeEnhance is a framework for content-consistent image enhancement using off-the-shelf image diffusion models.
In the noising stage, FreeEnhance is devised to add lighter noise to the region with higher frequency to preserve the high-frequent patterns in the original image.
In the denoising stage, we present three target properties as constraints to regularize the predicted noise, enhancing images with high acutance and high visual quality.
arXiv Detail & Related papers (2024-09-11T17:58:50Z) - ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - Automatic Generation of Fashion Images using Prompting in Generative Machine Learning Models [1.8817715864806608]
This work investigates methodologies for generating tailored fashion descriptions using two distinct Large Language Models and a Stable Diffusion model for fashion image creation.
Emphasizing adaptability in AI-driven fashion creativity, we focus on prompting techniques, such as zero-shot and few-shot learning.
Evaluation combines quantitative metrics such as CLIPscore with qualitative human judgment, highlighting strengths in creativity, coherence, and aesthetic appeal across diverse styles.
arXiv Detail & Related papers (2024-07-20T17:37:51Z) - Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment.
We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images.
We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z) - Style Aligned Image Generation via Shared Attention [61.121465570763085]
We introduce StyleAligned, a technique designed to establish style alignment among a series of generated images.
By employing minimal attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models.
Our method's evaluation across diverse styles and text prompts demonstrates high-quality and fidelity.
arXiv Detail & Related papers (2023-12-04T18:55:35Z) - Cross-Image Attention for Zero-Shot Appearance Transfer [68.43651329067393]
We introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images.
We harness three mechanisms that either manipulate the noisy latent codes or the model's internal representations throughout the denoising process.
Experiments show that our method is effective across a wide range of object categories and is robust to variations in shape, size, and viewpoint.
arXiv Detail & Related papers (2023-11-06T18:33:24Z) - FashionTex: Controllable Virtual Try-on with Text and Texture [29.7855591607239]
We propose a multi-modal interactive setting by combining the advantages of both text and texture for multi-level fashion manipulation.
FashionTex framework can semantically control cloth types and local texture patterns without annotated pairwise training data.
arXiv Detail & Related papers (2023-05-08T04:10:36Z) - M6-Fashion: High-Fidelity Multi-modal Image Generation and Editing [51.033376763225675]
We adapt style prior knowledge and flexibility of multi-modal control into one unified two-stage framework, M6-Fashion, focusing on the practical AI-aided Fashion design.
M6-Fashion utilizes self-correction for the non-autoregressive generation to improve inference speed, enhance holistic consistency, and support various signal controls.
arXiv Detail & Related papers (2022-05-24T01:18:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.