Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to
Model Evaluation
- URL: http://arxiv.org/abs/2309.14859v2
- Date: Mon, 11 Mar 2024 09:39:33 GMT
- Title: Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to
Model Evaluation
- Authors: Shih-Ying Yeh, Yu-Guan Hsieh, Zhidong Gao, Bernard B W Yang, Giyeong
Oh, Yanmin Gong
- Abstract summary: This paper introduces LyCORIS, an open-source library that offers a wide selection of fine-tuning methodologies for Stable Diffusion.
We also present a framework for the systematic assessment of varied fine-tuning techniques.
Our work provides essential insights into the nuanced effects of fine-tuning parameters, bridging the gap between state-of-the-art research and practical application.
- Score: 6.7311791228366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image generative models have garnered immense attention for their
ability to produce high-fidelity images from text prompts. Among these, Stable
Diffusion distinguishes itself as a leading open-source model in this
fast-growing field. However, the intricacies of fine-tuning these models pose
multiple challenges from new methodology integration to systematic evaluation.
Addressing these issues, this paper introduces LyCORIS (Lora beYond
Conventional methods, Other Rank adaptation Implementations for Stable
diffusion) [https://github.com/KohakuBlueleaf/LyCORIS], an open-source library
that offers a wide selection of fine-tuning methodologies for Stable Diffusion.
Furthermore, we present a thorough framework for the systematic assessment of
varied fine-tuning techniques. This framework employs a diverse suite of
metrics and delves into multiple facets of fine-tuning, including
hyperparameter adjustments and the evaluation with different prompt types
across various concept categories. Through this comprehensive approach, our
work provides essential insights into the nuanced effects of fine-tuning
parameters, bridging the gap between state-of-the-art research and practical
application.
Related papers
- MGPATH: Vision-Language Model with Multi-Granular Prompt Learning for Few-Shot WSI Classification [19.29480118378639]
Whole slide pathology image classification presents challenges due to gigapixel image sizes and limited annotation labels.
This paper introduces a prompt learning method to adapt large vision-language models for few-shot pathology classification.
arXiv Detail & Related papers (2025-02-11T09:42:13Z) - Weak Supervision Dynamic KL-Weighted Diffusion Models Guided by Large Language Models [0.0]
We present a novel method for improving text-to-image generation by combining Large Language Models with diffusion models.
Our approach incorporates semantic understanding from pre-trained LLMs to guide the generation process.
Our method significantly improves both the visual quality and alignment of generated images with text descriptions.
arXiv Detail & Related papers (2025-02-02T15:43:13Z) - Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting [0.17975553762582286]
Current image stitching methods produce noticeable seams in challenging scenarios such as uneven hue and large parallax.
We propose the Reference-Driven Inpainting Stitcher (RDIStitcher) to reformulate the image fusion and rectangling as a reference-based inpainting model.
We present the Multimodal Large Language Models (MLLMs)-based metrics, offering a new perspective on evaluating stitched image quality.
arXiv Detail & Related papers (2024-11-15T16:05:01Z) - Coherent and Multi-modality Image Inpainting via Latent Space Optimization [61.99406669027195]
PILOT (intextbfPainting vtextbfIa textbfLatent textbfOptextbfTimization) is an optimization approach grounded on a novel textitsemantic centralization and textitbackground preservation loss.
Our method searches latent spaces capable of generating inpainted regions that exhibit high fidelity to user-provided prompts while maintaining coherence with the background.
arXiv Detail & Related papers (2024-07-10T19:58:04Z) - Improving Few-shot Image Generation by Structural Discrimination and
Textural Modulation [10.389698647141296]
Few-shot image generation aims to produce plausible and diverse images for one category given a few images from this category.
Existing approaches either globally interpolate different images or fuse local representations with pre-defined coefficients.
This paper proposes a novel mechanism to inject external semantic signals into internal local representations.
arXiv Detail & Related papers (2023-08-30T16:10:21Z) - Diffusion-based Visual Counterfactual Explanations -- Towards Systematic
Quantitative Evaluation [64.0476282000118]
Latest methods for visual counterfactual explanations (VCE) harness the power of deep generative models to synthesize new examples of high-dimensional images of impressive quality.
It is currently difficult to compare the performance of these VCE methods as the evaluation procedures largely vary and often boil down to visual inspection of individual examples and small scale user studies.
We propose a framework for systematic, quantitative evaluation of the VCE methods and a minimal set of metrics to be used.
arXiv Detail & Related papers (2023-08-11T12:22:37Z) - Better Understanding Differences in Attribution Methods via Systematic Evaluations [57.35035463793008]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models.
arXiv Detail & Related papers (2023-03-21T14:24:58Z) - Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC [102.64648158034568]
diffusion models have quickly become the prevailing approach to generative modeling in many domains.
We propose an energy-based parameterization of diffusion models which enables the use of new compositional operators.
We find these samplers lead to notable improvements in compositional generation across a wide set of problems.
arXiv Detail & Related papers (2023-02-22T18:48:46Z) - Diversity is Definitely Needed: Improving Model-Agnostic Zero-shot
Classification via Stable Diffusion [22.237426507711362]
Model-Agnostic Zero-Shot Classification (MA-ZSC) refers to training non-specific classification architectures to classify real images without using any real images during training.
Recent research has demonstrated that generating synthetic training images using diffusion models provides a potential solution to address MA-ZSC.
We propose modifications to the text-to-image generation process using a pre-trained diffusion model to enhance diversity.
arXiv Detail & Related papers (2023-02-07T07:13:53Z) - Switchable Representation Learning Framework with Self-compatibility [50.48336074436792]
We propose a Switchable representation learning Framework with Self-Compatibility (SFSC)
SFSC generates a series of compatible sub-models with different capacities through one training process.
SFSC achieves state-of-the-art performance on the evaluated datasets.
arXiv Detail & Related papers (2022-06-16T16:46:32Z) - Deep Variational Models for Collaborative Filtering-based Recommender
Systems [63.995130144110156]
Deep learning provides accurate collaborative filtering models to improve recommender system results.
Our proposed models apply the variational concept to injectity in the latent space of the deep architecture.
Results show the superiority of the proposed approach in scenarios where the variational enrichment exceeds the injected noise effect.
arXiv Detail & Related papers (2021-07-27T08:59:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.