Using Multimodal Foundation Models and Clustering for Improved Style Ambiguity Loss
- URL: http://arxiv.org/abs/2407.12009v1
- Date: Thu, 20 Jun 2024 15:43:13 GMT
- Title: Using Multimodal Foundation Models and Clustering for Improved Style Ambiguity Loss
- Authors: James Baker,
- Abstract summary: We explore a new form of the style ambiguity training objective, used to approximate creativity, that does not require training a classifier or even a labeled dataset.
We find our new methods improve upon the traditional method, based on automated metrics for human judgment, while still maintaining creativity and novelty.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Teaching text-to-image models to be creative involves using style ambiguity loss, which requires a pretrained classifier. In this work, we explore a new form of the style ambiguity training objective, used to approximate creativity, that does not require training a classifier or even a labeled dataset. We then train a diffusion model to maximize style ambiguity to imbue the diffusion model with creativity and find our new methods improve upon the traditional method, based on automated metrics for human judgment, while still maintaining creativity and novelty.
Related papers
- Leveraging Hierarchical Taxonomies in Prompt-based Continual Learning [42.03016266965012]
We find that applying human habits of organizing and connecting information can serve as an efficient strategy when training deep learning models.
We propose a novel regularization loss function that encourages models to focus more on challenging knowledge areas.
arXiv Detail & Related papers (2024-10-06T01:30:40Z) - Using Style Ambiguity Loss to Improve Aesthetics of Diffusion Models [0.0]
Teaching text-to-image models to be creative involves using style ambiguity loss.
In this work, we explore using the style ambiguity training objective, used to approximate creativity, on a diffusion model.
We find that the models trained with style ambiguity loss can generate better images than the baseline diffusion models and GANs.
arXiv Detail & Related papers (2024-10-02T22:05:30Z) - Automatic Generation of Fashion Images using Prompting in Generative Machine Learning Models [1.8817715864806608]
This work investigates methodologies for generating tailored fashion descriptions using two distinct Large Language Models and a Stable Diffusion model for fashion image creation.
Emphasizing adaptability in AI-driven fashion creativity, we focus on prompting techniques, such as zero-shot and few-shot learning.
Evaluation combines quantitative metrics such as CLIPscore with qualitative human judgment, highlighting strengths in creativity, coherence, and aesthetic appeal across diverse styles.
arXiv Detail & Related papers (2024-07-20T17:37:51Z) - An Improved Method for Personalizing Diffusion Models [23.20529652769131]
Diffusion models have demonstrated impressive image generation capabilities.
Personalized approaches, such as textual inversion and Dreambooth, enhance model individualization using specific images.
Our proposed approach aims to retain the model's original knowledge during new information integration.
arXiv Detail & Related papers (2024-07-07T09:52:04Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Few-shot Calligraphy Style Learning [0.0]
"Presidifussion" is a novel approach to learning and replicating the unique style of calligraphy of President Xu.
We introduce innovative techniques of font image conditioning and stroke information conditioning, enabling the model to capture the intricate structural elements of Chinese characters.
This work not only presents a breakthrough in the digital preservation of calligraphic art but also sets a new standard for data-efficient generative modeling in the domain of cultural heritage digitization.
arXiv Detail & Related papers (2024-04-26T07:17:09Z) - HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced
Diffusion Models [84.12784265734238]
The goal of Arbitrary Style Transfer (AST) is injecting the artistic features of a style reference into a given image/video.
We propose HiCAST, which is capable of explicitly customizing the stylization results according to various source of semantic clues.
A novel learning objective is leveraged for video diffusion model training, which significantly improve cross-frame temporal consistency.
arXiv Detail & Related papers (2024-01-11T12:26:23Z) - Phasic Content Fusing Diffusion Model with Directional Distribution
Consistency for Few-Shot Model Adaption [73.98706049140098]
We propose a novel phasic content fusing few-shot diffusion model with directional distribution consistency loss.
Specifically, we design a phasic training strategy with phasic content fusion to help our model learn content and style information when t is large.
Finally, we propose a cross-domain structure guidance strategy that enhances structure consistency during domain adaptation.
arXiv Detail & Related papers (2023-09-07T14:14:11Z) - Training Diffusion Models with Reinforcement Learning [82.29328477109826]
Diffusion models are trained with an approximation to the log-likelihood objective.
In this paper, we investigate reinforcement learning methods for directly optimizing diffusion models for downstream objectives.
We describe how posing denoising as a multi-step decision-making problem enables a class of policy gradient algorithms.
arXiv Detail & Related papers (2023-05-22T17:57:41Z) - DST: Dynamic Substitute Training for Data-free Black-box Attack [79.61601742693713]
We propose a novel dynamic substitute training attack method to encourage substitute model to learn better and faster from the target model.
We introduce a task-driven graph-based structure information learning constrain to improve the quality of generated training data.
arXiv Detail & Related papers (2022-04-03T02:29:11Z) - Stylized Adversarial Defense [105.88250594033053]
adversarial training creates perturbation patterns and includes them in the training set to robustify the model.
We propose to exploit additional information from the feature space to craft stronger adversaries.
Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses.
arXiv Detail & Related papers (2020-07-29T08:38:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.