Reward Guided Latent Consistency Distillation
- URL: http://arxiv.org/abs/2403.11027v2
- Date: Mon, 07 Oct 2024 18:47:47 GMT
- Title: Reward Guided Latent Consistency Distillation
- Authors: Jiachen Li, Weixi Feng, Wenhu Chen, William Yang Wang,
- Abstract summary: Latent Consistency Distillation (LCD) has emerged as a promising paradigm for efficient text-to-image synthesis.
We propose compensating the quality loss by aligning LCD's output with human preference during training.
- Score: 86.8911705127924
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Latent Consistency Distillation (LCD) has emerged as a promising paradigm for efficient text-to-image synthesis. By distilling a latent consistency model (LCM) from a pre-trained teacher latent diffusion model (LDM), LCD facilitates the generation of high-fidelity images within merely 2 to 4 inference steps. However, the LCM's efficient inference is obtained at the cost of the sample quality. In this paper, we propose compensating the quality loss by aligning LCM's output with human preference during training. Specifically, we introduce Reward Guided LCD (RG-LCD), which integrates feedback from a reward model (RM) into the LCD process by augmenting the original LCD loss with the objective of maximizing the reward associated with LCM's single-step generation. As validated through human evaluation, when trained with the feedback of a good RM, the 2-step generations from our RG-LCM are favored by humans over the 50-step DDIM samples from the teacher LDM, representing a 25-time inference acceleration without quality loss. As directly optimizing towards differentiable RMs can suffer from over-optimization, we take the initial step to overcome this difficulty by proposing the use of a latent proxy RM (LRM). This novel component serves as an intermediary, connecting our LCM with the RM. Empirically, we demonstrate that incorporating the LRM into our RG-LCD successfully avoids high-frequency noise in the generated images, contributing to both improved Fr\'echet Inception Distance (FID) on MS-COCO and a higher HPSv2.1 score on HPSv2's test set, surpassing those achieved by the baseline LCM.
Related papers
- Delta-WKV: A Novel Meta-in-Context Learner for MRI Super-Resolution [0.7864304771129751]
We propose Delta-WKV, a novel MRI super-resolution model that combines Meta-in-Context Learning (MiCL) with the Delta rule to better recognize both local and global patterns in MRI images.
Tests show that Delta-WKV outperforms existing methods, improving PSNR by 0.06 dB and SSIM by 0.001, while reducing training and inference times by over 15%.
arXiv Detail & Related papers (2025-02-28T08:49:46Z) - InterLCM: Low-Quality Images as Intermediate States of Latent Consistency Models for Effective Blind Face Restoration [106.70903819362402]
Diffusion priors have been used for blind face restoration (BFR) by fine-tuning diffusion models (DMs) on restoration datasets to recover low-quality images.
We propose InterLCM to leverage the latent consistency model (LCM) for its superior semantic consistency and efficiency.
InterLCM outperforms existing approaches in both synthetic and real-world datasets while also achieving faster inference speed.
arXiv Detail & Related papers (2025-02-04T10:51:20Z) - AP-LDM: Attentive and Progressive Latent Diffusion Model for Training-Free High-Resolution Image Generation [12.564266865237343]
Latent diffusion models (LDMs) often experience significant structural distortions when directly generating high-resolution (HR) images.
We propose an Attentive and Progressive LDM (AP-LDM) aimed at enhancing HR image quality while accelerating the generation process.
AP-LDM decomposes the denoising process of LDMs into two stages: (i) attentive training-resolution denoising, and (ii) progressive high-resolution denoising.
arXiv Detail & Related papers (2024-10-08T13:56:28Z) - TLCM: Training-efficient Latent Consistency Model for Image Generation with 2-8 Steps [12.395969703425648]
Distilling latent diffusion models (LDMs) into ones that are fast to sample from is attracting growing research interest.
This paper proposes a novel training-efficient Latent Consistency Model (TLCM) to overcome these challenges.
With just 70 training hours on an A100 GPU, a 3-step TLCM distilled from SDXL achieves an impressive CLIP Score of 33.68 and an Aesthetic Score of 5.97 on the MSCOCO-2017 5K benchmark.
arXiv Detail & Related papers (2024-06-09T12:55:50Z) - Improved Distribution Matching Distillation for Fast Image Synthesis [54.72356560597428]
We introduce DMD2, a set of techniques that lift this limitation and improve DMD training.
First, we eliminate the regression loss and the need for expensive dataset construction.
Second, we integrate a GAN loss into the distillation procedure, discriminating between generated samples and real images.
arXiv Detail & Related papers (2024-05-23T17:59:49Z) - EdgeFusion: On-Device Text-to-Image Generation [3.3345550849564836]
We develop a compact SD variant, BK-SDM, for text-to-image generation.
We achieve rapid generation of photo-realistic, text-aligned images in just two steps, with latency under one second on resource-limited edge devices.
arXiv Detail & Related papers (2024-04-18T06:02:54Z) - LCM-LoRA: A Universal Stable-Diffusion Acceleration Module [52.8517132452467]
Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks.
This report further extends LCMs' potential by applying LoRA distillation to larger Stable-Diffusion models.
We identify the LoRA parameters obtained through LCM distillation as a universal Stable-Diffusion acceleration module, named LCM-LoRA.
arXiv Detail & Related papers (2023-11-09T18:04:15Z) - Latent Consistency Models: Synthesizing High-Resolution Images with
Few-Step Inference [60.32804641276217]
We propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs.
A high-quality 768 x 768 24-step LCM takes only 32 A100 GPU hours for training.
We also introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets.
arXiv Detail & Related papers (2023-10-06T17:11:58Z) - Cross-Modality Earth Mover's Distance for Visible Thermal Person
Re-Identification [82.01051164653583]
Visible thermal person re-identification (VT-ReID) suffers from the inter-modality discrepancy and intra-identity variations.
We propose the Cross-Modality Earth Mover's Distance (CM-EMD) that can alleviate the impact of the intra-identity variations during modality alignment.
arXiv Detail & Related papers (2022-03-03T12:26:59Z) - Two-Stage Self-Supervised Cycle-Consistency Network for Reconstruction
of Thin-Slice MR Images [62.4428833931443]
The thick-slice magnetic resonance (MR) images are often structurally blurred in coronal and sagittal views.
Deep learning has shown great potential to re-construct the high-resolution (HR) thin-slice MR images from those low-resolution (LR) cases.
We propose a novel Two-stage Self-supervised Cycle-consistency Network (TSCNet) for MR slice reconstruction.
arXiv Detail & Related papers (2021-06-29T13:29:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.