Related papers: Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to Model Evaluation

Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to Model Evaluation

URL: http://arxiv.org/abs/2309.14859v2
Date: Mon, 11 Mar 2024 09:39:33 GMT
Title: Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to Model Evaluation
Authors: Shih-Ying Yeh, Yu-Guan Hsieh, Zhidong Gao, Bernard B W Yang, Giyeong Oh, Yanmin Gong
Abstract summary: This paper introduces LyCORIS, an open-source library that offers a wide selection of fine-tuning methodologies for Stable Diffusion. We also present a framework for the systematic assessment of varied fine-tuning techniques. Our work provides essential insights into the nuanced effects of fine-tuning parameters, bridging the gap between state-of-the-art research and practical application.
Score: 6.7311791228366
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-image generative models have garnered immense attention for their ability to produce high-fidelity images from text prompts. Among these, Stable Diffusion distinguishes itself as a leading open-source model in this fast-growing field. However, the intricacies of fine-tuning these models pose multiple challenges from new methodology integration to systematic evaluation. Addressing these issues, this paper introduces LyCORIS (Lora beYond Conventional methods, Other Rank adaptation Implementations for Stable diffusion) [https://github.com/KohakuBlueleaf/LyCORIS], an open-source library that offers a wide selection of fine-tuning methodologies for Stable Diffusion. Furthermore, we present a thorough framework for the systematic assessment of varied fine-tuning techniques. This framework employs a diverse suite of metrics and delves into multiple facets of fine-tuning, including hyperparameter adjustments and the evaluation with different prompt types across various concept categories. Through this comprehensive approach, our work provides essential insights into the nuanced effects of fine-tuning parameters, bridging the gap between state-of-the-art research and practical application.

Related papers

MGPATH: Vision-Language Model with Multi-Granular Prompt Learning for Few-Shot WSI Classification [19.29480118378639]
Whole slide pathology image classification presents challenges due to gigapixel image sizes and limited annotation labels. This paper introduces a prompt learning method to adapt large vision-language models for few-shot pathology classification.
arXiv Detail & Related papers (2025-02-11T09:42:13Z)
Weak Supervision Dynamic KL-Weighted Diffusion Models Guided by Large Language Models [0.0]
We present a novel method for improving text-to-image generation by combining Large Language Models with diffusion models. Our approach incorporates semantic understanding from pre-trained LLMs to guide the generation process. Our method significantly improves both the visual quality and alignment of generated images with text descriptions.
arXiv Detail & Related papers (2025-02-02T15:43:13Z)
Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting [0.17975553762582286]
Current image stitching methods produce noticeable seams in challenging scenarios such as uneven hue and large parallax. We propose the Reference-Driven Inpainting Stitcher (RDIStitcher) to reformulate the image fusion and rectangling as a reference-based inpainting model. We present the Multimodal Large Language Models (MLLMs)-based metrics, offering a new perspective on evaluating stitched image quality.
arXiv Detail & Related papers (2024-11-15T16:05:01Z)
Fine Tuning Text-to-Image Diffusion Models for Correcting Anomalous Images [0.0]
This study proposes a method to mitigate such issues by fine-tuning the Stable Diffusion 3 model using the DreamBooth technique. Experimental results targeting the prompt "lying on the grass/street" demonstrate that the fine-tuned model shows improved performance in visual evaluation and metrics such as Structural Similarity Index (SSIM), Peak Signal-to-Noise Ratio (PSNR), and Frechet Inception Distance (FID)
arXiv Detail & Related papers (2024-09-23T00:51:47Z)
Coherent and Multi-modality Image Inpainting via Latent Space Optimization [61.99406669027195]
PILOT (intextbfPainting vtextbfIa textbfLatent textbfOptextbfTimization) is an optimization approach grounded on a novel textitsemantic centralization and textitbackground preservation loss. Our method searches latent spaces capable of generating inpainted regions that exhibit high fidelity to user-provided prompts while maintaining coherence with the background.
arXiv Detail & Related papers (2024-07-10T19:58:04Z)
Diffusion Model-Based Image Editing: A Survey [46.244266782108234]
Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks. We provide an exhaustive overview of existing methods using diffusion models for image editing. To further evaluate the performance of text-guided image editing algorithms, we propose a systematic benchmark, EditEval.
arXiv Detail & Related papers (2024-02-27T14:07:09Z)
Improving Few-shot Image Generation by Structural Discrimination and Textural Modulation [10.389698647141296]
Few-shot image generation aims to produce plausible and diverse images for one category given a few images from this category. Existing approaches either globally interpolate different images or fuse local representations with pre-defined coefficients. This paper proposes a novel mechanism to inject external semantic signals into internal local representations.
arXiv Detail & Related papers (2023-08-30T16:10:21Z)
Diffusion-based Visual Counterfactual Explanations -- Towards Systematic Quantitative Evaluation [64.0476282000118]
Latest methods for visual counterfactual explanations (VCE) harness the power of deep generative models to synthesize new examples of high-dimensional images of impressive quality. It is currently difficult to compare the performance of these VCE methods as the evaluation procedures largely vary and often boil down to visual inspection of individual examples and small scale user studies. We propose a framework for systematic, quantitative evaluation of the VCE methods and a minimal set of metrics to be used.
arXiv Detail & Related papers (2023-08-11T12:22:37Z)
Better Understanding Differences in Attribution Methods via Systematic Evaluations [57.35035463793008]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions. We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods. We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models.
arXiv Detail & Related papers (2023-03-21T14:24:58Z)
Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC [102.64648158034568]
diffusion models have quickly become the prevailing approach to generative modeling in many domains. We propose an energy-based parameterization of diffusion models which enables the use of new compositional operators. We find these samplers lead to notable improvements in compositional generation across a wide set of problems.
arXiv Detail & Related papers (2023-02-22T18:48:46Z)
Diversity is Definitely Needed: Improving Model-Agnostic Zero-shot Classification via Stable Diffusion [22.237426507711362]
Model-Agnostic Zero-Shot Classification (MA-ZSC) refers to training non-specific classification architectures to classify real images without using any real images during training. Recent research has demonstrated that generating synthetic training images using diffusion models provides a potential solution to address MA-ZSC. We propose modifications to the text-to-image generation process using a pre-trained diffusion model to enhance diversity.
arXiv Detail & Related papers (2023-02-07T07:13:53Z)
Switchable Representation Learning Framework with Self-compatibility [50.48336074436792]
We propose a Switchable representation learning Framework with Self-Compatibility (SFSC) SFSC generates a series of compatible sub-models with different capacities through one training process. SFSC achieves state-of-the-art performance on the evaluated datasets.
arXiv Detail & Related papers (2022-06-16T16:46:32Z)
Deep Variational Models for Collaborative Filtering-based Recommender Systems [63.995130144110156]
Deep learning provides accurate collaborative filtering models to improve recommender system results. Our proposed models apply the variational concept to injectity in the latent space of the deep architecture. Results show the superiority of the proposed approach in scenarios where the variational enrichment exceeds the injected noise effect.
arXiv Detail & Related papers (2021-07-27T08:59:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.