LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
- URL: http://arxiv.org/abs/2311.05556v1
- Date: Thu, 9 Nov 2023 18:04:15 GMT
- Title: LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
- Authors: Simian Luo, Yiqin Tan, Suraj Patil, Daniel Gu, Patrick von Platen,
Apolin\'ario Passos, Longbo Huang, Jian Li, Hang Zhao
- Abstract summary: Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks.
This report further extends LCMs' potential by applying LoRA distillation to larger Stable-Diffusion models.
We identify the LoRA parameters obtained through LCM distillation as a universal Stable-Diffusion acceleration module, named LCM-LoRA.
- Score: 52.8517132452467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Latent Consistency Models (LCMs) have achieved impressive performance in
accelerating text-to-image generative tasks, producing high-quality images with
minimal inference steps. LCMs are distilled from pre-trained latent diffusion
models (LDMs), requiring only ~32 A100 GPU training hours. This report further
extends LCMs' potential in two aspects: First, by applying LoRA distillation to
Stable-Diffusion models including SD-V1.5, SSD-1B, and SDXL, we have expanded
LCM's scope to larger models with significantly less memory consumption,
achieving superior image generation quality. Second, we identify the LoRA
parameters obtained through LCM distillation as a universal Stable-Diffusion
acceleration module, named LCM-LoRA. LCM-LoRA can be directly plugged into
various Stable-Diffusion fine-tuned models or LoRAs without training, thus
representing a universally applicable accelerator for diverse image generation
tasks. Compared with previous numerical PF-ODE solvers such as DDIM,
DPM-Solver, LCM-LoRA can be viewed as a plug-in neural PF-ODE solver that
possesses strong generalization abilities. Project page:
https://github.com/luosiallen/latent-consistency-model.
Related papers
- Cached Multi-Lora Composition for Multi-Concept Image Generation [10.433033595844442]
Low-Rank Adaptation (LoRA) has emerged as a widely adopted technique in text-to-image models.
Current approaches face significant challenges when composing these LoRAs for multi-concept image generation.
We introduce a novel, training-free framework, Cached Multi-LoRA (CMLoRA), designed to efficiently integrate multiple LoRAs.
arXiv Detail & Related papers (2025-02-07T13:41:51Z) - One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation [60.54811860967658]
FluxSR is a novel one-step diffusion Real-ISR based on flow matching models.
First, we introduce Flow Trajectory Distillation (FTD) to distill a multi-step flow matching model into a one-step Real-ISR.
Second, to improve image realism and address high-frequency artifact issues in generated images, we propose TV-LPIPS as a perceptual loss.
arXiv Detail & Related papers (2025-02-04T04:11:29Z) - Large Language Models for Multimodal Deformable Image Registration [50.91473745610945]
We propose a novel coarse-to-fine MDIR framework,LLM-Morph, for aligning the deep features from different modal medical images.
Specifically, we first utilize a CNN encoder to extract deep visual features from cross-modal image pairs, then we use the first adapter to adjust these tokens, and use LoRA in pre-trained LLMs to fine-tune their weights.
Third, for the alignment of tokens, we utilize other four adapters to transform the LLM-encoded tokens into multi-scale visual features, generating multi-scale deformation fields and facilitating the coarse-to-fine MDIR task
arXiv Detail & Related papers (2024-08-20T09:58:30Z) - Phased Consistency Model [80.31766777570058]
The consistency model (CM) has recently made significant progress in accelerating the generation of diffusion models.
However, its application to high-resolution, text-conditioned image generation in the latent space (a.k.a., LCM) remains unsatisfactory.
We propose the Phased Consistency Model (PCM), which generalizes the design space and addresses all identified limitations.
arXiv Detail & Related papers (2024-05-28T17:47:19Z) - Latent Modulated Function for Computational Optimal Continuous Image Representation [20.678662838709542]
We propose a novel Latent Modulated Rendering (LMF) algorithm for continuous image representation.
We show that converting existing INR-based methods to LMF can reduce the computational cost by up to 99.9%.
Experiments demonstrate that converting existing INR-based methods to LMF can reduce inference by up to 57 times, and save up to 76% parameters.
arXiv Detail & Related papers (2024-04-25T09:30:38Z) - Reward Guided Latent Consistency Distillation [86.8911705127924]
Latent Consistency Distillation (LCD) has emerged as a promising paradigm for efficient text-to-image synthesis.
We propose compensating the quality loss by aligning LCD's output with human preference during training.
arXiv Detail & Related papers (2024-03-16T22:14:56Z) - Boosting Latent Diffusion with Flow Matching [23.043115108005708]
Flow matching (FM) offers faster training and inference but exhibits less diversity in synthesis.
We demonstrate that introducing FM between the Diffusion model and the convolutional decoder offers high-resolution image synthesis.
We achieve state-of-the-art high-resolution image synthesis at $10242$ with minimal computational cost.
arXiv Detail & Related papers (2023-12-12T15:30:24Z) - Latent Consistency Models: Synthesizing High-Resolution Images with
Few-Step Inference [60.32804641276217]
We propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs.
A high-quality 768 x 768 24-step LCM takes only 32 A100 GPU hours for training.
We also introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets.
arXiv Detail & Related papers (2023-10-06T17:11:58Z) - CA-LoRA: Adapting Existing LoRA for Compressed LLMs to Enable Efficient Multi-Tasking on Personal Devices [78.16679232748196]
We introduce a Compression-Aware LoRA (CA-LoRA) framework to transfer Large Language Models (LLMs) to other tasks.
Experiment results demonstrate that CA-LoRA outperforms the vanilla LoRA methods applied to a compressed LLM.
The source code of CA-LoRA is available at https://github.com/thunlp/CA-LoRA.
arXiv Detail & Related papers (2023-07-15T04:37:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.