Using Style Ambiguity Loss to Improve Aesthetics of Diffusion Models
- URL: http://arxiv.org/abs/2410.02055v1
- Date: Wed, 2 Oct 2024 22:05:30 GMT
- Title: Using Style Ambiguity Loss to Improve Aesthetics of Diffusion Models
- Authors: James Baker,
- Abstract summary: Teaching text-to-image models to be creative involves using style ambiguity loss.
In this work, we explore using the style ambiguity training objective, used to approximate creativity, on a diffusion model.
We find that the models trained with style ambiguity loss can generate better images than the baseline diffusion models and GANs.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Teaching text-to-image models to be creative involves using style ambiguity loss. In this work, we explore using the style ambiguity training objective, used to approximate creativity, on a diffusion model. We then experiment with forms of style ambiguity loss that do not require training a classifier or a labeled dataset, and find that the models trained with style ambiguity loss can generate better images than the baseline diffusion models and GANs. Code is available at https://github.com/jamesBaker361/clipcreate.
Related papers
- Using Multimodal Foundation Models and Clustering for Improved Style Ambiguity Loss [0.0]
We explore a new form of the style ambiguity training objective, used to approximate creativity, that does not require training a classifier or even a labeled dataset.
We find our new methods improve upon the traditional method, based on automated metrics for human judgment, while still maintaining creativity and novelty.
arXiv Detail & Related papers (2024-06-20T15:43:13Z) - MuseumMaker: Continual Style Customization without Catastrophic Forgetting [50.12727620780213]
We propose MuseumMaker, a method that enables the synthesis of images by following a set of customized styles in a never-end manner.
When facing with a new customization style, we develop a style distillation loss module to extract and learn the styles of the training data for new image generation.
It can minimize the learning biases caused by content of new training images, and address the catastrophic overfitting issue induced by few-shot images.
arXiv Detail & Related papers (2024-04-25T13:51:38Z) - Measuring Style Similarity in Diffusion Models [118.22433042873136]
We present a framework for understanding and extracting style descriptors from images.
Our framework comprises a new dataset curated using the insight that style is a subjective property of an image.
We also propose a method to extract style attribute descriptors that can be used to style of a generated image to the images used in the training dataset of a text-to-image model.
arXiv Detail & Related papers (2024-04-01T17:58:30Z) - DEADiff: An Efficient Stylization Diffusion Model with Disentangled
Representations [64.43387739794531]
Current encoder-based approaches significantly impair the text controllability of text-to-image models while transferring styles.
We introduce DEADiff to address this issue using the following two strategies.
DEAiff attains the best visual stylization results and optimal balance between the text controllability inherent in the text-to-image model and style similarity to the reference image.
arXiv Detail & Related papers (2024-03-11T17:35:23Z) - Training Diffusion Models with Reinforcement Learning [82.29328477109826]
Diffusion models are trained with an approximation to the log-likelihood objective.
In this paper, we investigate reinforcement learning methods for directly optimizing diffusion models for downstream objectives.
We describe how posing denoising as a multi-step decision-making problem enables a class of policy gradient algorithms.
arXiv Detail & Related papers (2023-05-22T17:57:41Z) - Text-to-Image Diffusion Models are Zero-Shot Classifiers [8.26990105697146]
We investigate text-to-image diffusion models by proposing a method for evaluating them as zero-shot classifiers.
We apply our method to Stable Diffusion and Imagen, using it to probe fine-grained aspects of the models' knowledge.
They perform competitively with CLIP on a wide range of zero-shot image classification datasets.
arXiv Detail & Related papers (2023-03-27T14:15:17Z) - Ablating Concepts in Text-to-Image Diffusion Models [57.9371041022838]
Large-scale text-to-image diffusion models can generate high-fidelity images with powerful compositional ability.
These models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos.
We propose an efficient method of ablating concepts in the pretrained model, preventing the generation of a target concept.
arXiv Detail & Related papers (2023-03-23T17:59:42Z) - Style-Agnostic Reinforcement Learning [9.338454092492901]
We present a novel method of learning style-agnostic representation using both style transfer and adversarial learning.
Our method trains the actor with diverse image styles generated from an inherent adversarial style generator.
We verify that our method achieves competitive or better performances than the state-of-the-art approaches on Procgen and Distracting Control Suite benchmarks.
arXiv Detail & Related papers (2022-08-31T13:45:00Z) - Adversarial Style Augmentation for Domain Generalized Urban-Scene
Segmentation [120.96012935286913]
We propose a novel adversarial style augmentation approach, which can generate hard stylized images during training.
Experiments on two synthetic-to-real semantic segmentation benchmarks demonstrate that AdvStyle can significantly improve the model performance on unseen real domains.
arXiv Detail & Related papers (2022-07-11T14:01:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.