Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to
Model Evaluation
- URL: http://arxiv.org/abs/2309.14859v2
- Date: Mon, 11 Mar 2024 09:39:33 GMT
- Title: Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to
Model Evaluation
- Authors: Shih-Ying Yeh, Yu-Guan Hsieh, Zhidong Gao, Bernard B W Yang, Giyeong
Oh, Yanmin Gong
- Abstract summary: This paper introduces LyCORIS, an open-source library that offers a wide selection of fine-tuning methodologies for Stable Diffusion.
We also present a framework for the systematic assessment of varied fine-tuning techniques.
Our work provides essential insights into the nuanced effects of fine-tuning parameters, bridging the gap between state-of-the-art research and practical application.
- Score: 6.7311791228366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image generative models have garnered immense attention for their
ability to produce high-fidelity images from text prompts. Among these, Stable
Diffusion distinguishes itself as a leading open-source model in this
fast-growing field. However, the intricacies of fine-tuning these models pose
multiple challenges from new methodology integration to systematic evaluation.
Addressing these issues, this paper introduces LyCORIS (Lora beYond
Conventional methods, Other Rank adaptation Implementations for Stable
diffusion) [https://github.com/KohakuBlueleaf/LyCORIS], an open-source library
that offers a wide selection of fine-tuning methodologies for Stable Diffusion.
Furthermore, we present a thorough framework for the systematic assessment of
varied fine-tuning techniques. This framework employs a diverse suite of
metrics and delves into multiple facets of fine-tuning, including
hyperparameter adjustments and the evaluation with different prompt types
across various concept categories. Through this comprehensive approach, our
work provides essential insights into the nuanced effects of fine-tuning
parameters, bridging the gap between state-of-the-art research and practical
application.
Related papers
- Coherent and Multi-modality Image Inpainting via Latent Space Optimization [61.99406669027195]
PILOT (intextbfPainting vtextbfIa textbfLatent textbfOptextbfTimization) is an optimization approach grounded on a novel textitsemantic centralization and textitbackground preservation loss.
Our method searches latent spaces capable of generating inpainted regions that exhibit high fidelity to user-provided prompts while maintaining coherence with the background.
arXiv Detail & Related papers (2024-07-10T19:58:04Z) - Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention
Regulation in Diffusion Models [23.786473791344395]
Cross-attention layers in diffusion models tend to disproportionately focus on certain tokens during the generation process.
We introduce attention regulation, an on-the-fly optimization approach at inference time to align attention maps with the input text prompt.
Experiment results show that our method consistently outperforms other baselines.
arXiv Detail & Related papers (2024-03-11T02:18:27Z) - Diffusion Model-Based Image Editing: A Survey [46.244266782108234]
Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks.
We provide an exhaustive overview of existing methods using diffusion models for image editing.
To further evaluate the performance of text-guided image editing algorithms, we propose a systematic benchmark, EditEval.
arXiv Detail & Related papers (2024-02-27T14:07:09Z) - Improving Few-shot Image Generation by Structural Discrimination and
Textural Modulation [10.389698647141296]
Few-shot image generation aims to produce plausible and diverse images for one category given a few images from this category.
Existing approaches either globally interpolate different images or fuse local representations with pre-defined coefficients.
This paper proposes a novel mechanism to inject external semantic signals into internal local representations.
arXiv Detail & Related papers (2023-08-30T16:10:21Z) - Diffusion-based Visual Counterfactual Explanations -- Towards Systematic
Quantitative Evaluation [64.0476282000118]
Latest methods for visual counterfactual explanations (VCE) harness the power of deep generative models to synthesize new examples of high-dimensional images of impressive quality.
It is currently difficult to compare the performance of these VCE methods as the evaluation procedures largely vary and often boil down to visual inspection of individual examples and small scale user studies.
We propose a framework for systematic, quantitative evaluation of the VCE methods and a minimal set of metrics to be used.
arXiv Detail & Related papers (2023-08-11T12:22:37Z) - Better Understanding Differences in Attribution Methods via Systematic Evaluations [57.35035463793008]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models.
arXiv Detail & Related papers (2023-03-21T14:24:58Z) - Reduce, Reuse, Recycle: Compositional Generation with Energy-Based
Diffusion Models and MCMC [106.06185677214353]
diffusion models have quickly become the prevailing approach to generative modeling in many domains.
We propose an energy-based parameterization of diffusion models which enables the use of new compositional operators.
We find these samplers lead to notable improvements in compositional generation across a wide set of problems.
arXiv Detail & Related papers (2023-02-22T18:48:46Z) - Diversity is Definitely Needed: Improving Model-Agnostic Zero-shot
Classification via Stable Diffusion [22.237426507711362]
Model-Agnostic Zero-Shot Classification (MA-ZSC) refers to training non-specific classification architectures to classify real images without using any real images during training.
Recent research has demonstrated that generating synthetic training images using diffusion models provides a potential solution to address MA-ZSC.
We propose modifications to the text-to-image generation process using a pre-trained diffusion model to enhance diversity.
arXiv Detail & Related papers (2023-02-07T07:13:53Z) - Switchable Representation Learning Framework with Self-compatibility [50.48336074436792]
We propose a Switchable representation learning Framework with Self-Compatibility (SFSC)
SFSC generates a series of compatible sub-models with different capacities through one training process.
SFSC achieves state-of-the-art performance on the evaluated datasets.
arXiv Detail & Related papers (2022-06-16T16:46:32Z) - Deep Variational Models for Collaborative Filtering-based Recommender
Systems [63.995130144110156]
Deep learning provides accurate collaborative filtering models to improve recommender system results.
Our proposed models apply the variational concept to injectity in the latent space of the deep architecture.
Results show the superiority of the proposed approach in scenarios where the variational enrichment exceeds the injected noise effect.
arXiv Detail & Related papers (2021-07-27T08:59:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.