Related papers: UniFL: Improve Stable Diffusion via Unified Feedback Learning

UniFL: Improve Stable Diffusion via Unified Feedback Learning

URL: http://arxiv.org/abs/2404.05595v2
Date: Wed, 22 May 2024 13:52:09 GMT
Title: UniFL: Improve Stable Diffusion via Unified Feedback Learning
Authors: Jiacheng Zhang, Jie Wu, Yuxi Ren, Xin Xia, Huafeng Kuang, Pan Xie, Jiashi Li, Xuefeng Xiao, Min Zheng, Lean Fu, Guanbin Li,
Abstract summary: We present UniFL, a unified framework that leverages feedback learning to enhance diffusion models comprehensively. UniFL incorporates three key components: perceptual feedback learning, which enhances visual quality; decoupled feedback learning, which improves aesthetic appeal; and adversarial feedback learning, which optimize inference speed. In-depth experiments and extensive user studies validate the superior performance of our proposed method in enhancing both the quality of generated models and their acceleration.
Score: 51.18278664629821
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models have revolutionized the field of image generation, leading to the proliferation of high-quality models and diverse downstream applications. However, despite these significant advancements, the current competitive solutions still suffer from several limitations, including inferior visual quality, a lack of aesthetic appeal, and inefficient inference, without a comprehensive solution in sight. To address these challenges, we present UniFL, a unified framework that leverages feedback learning to enhance diffusion models comprehensively. UniFL stands out as a universal, effective, and generalizable solution applicable to various diffusion models, such as SD1.5 and SDXL. Notably, UniFL incorporates three key components: perceptual feedback learning, which enhances visual quality; decoupled feedback learning, which improves aesthetic appeal; and adversarial feedback learning, which optimizes inference speed. In-depth experiments and extensive user studies validate the superior performance of our proposed method in enhancing both the quality of generated models and their acceleration. For instance, UniFL surpasses ImageReward by 17% user preference in terms of generation quality and outperforms LCM and SDXL Turbo by 57% and 20% in 4-step inference. Moreover, we have verified the efficacy of our approach in downstream tasks, including Lora, ControlNet, and AnimateDiff.

Related papers

One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step. To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration. Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z)
GeFL: Model-Agnostic Federated Learning with Generative Models [3.4546761246181696]
Federated learning (FL) is a promising paradigm in distributed learning while preserving the privacy of users. We propose Generative Model-Aided Federated Learning (GeFL), incorporating a generative model that aggregates global knowledge across users of heterogeneous models. We empirically demonstrate the consistent performance gains of GeFL-F, while demonstrating better privacy preservation and robustness to a large number of clients.
arXiv Detail & Related papers (2024-12-24T14:39:47Z)
Distillation-Free One-Step Diffusion for Real-World Image Super-Resolution [81.81748032199813]
We propose a Distillation-Free One-Step Diffusion model. Specifically, we propose a noise-aware discriminator (NAD) to participate in adversarial training. We improve the perceptual loss with edge-aware DISTS (EA-DISTS) to enhance the model's ability to generate fine details.
arXiv Detail & Related papers (2024-10-05T16:41:36Z)
DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception [66.88792390480343]
We propose DEEM, a simple but effective approach that utilizes the generative feedback of diffusion models to align the semantic distributions of the image encoder. DEEM exhibits enhanced robustness and a superior capacity to alleviate model hallucinations while utilizing fewer trainable parameters, less pre-training data, and a smaller base model size.
arXiv Detail & Related papers (2024-05-24T05:46:04Z)
Data-Free Federated Class Incremental Learning with Diffusion-Based Generative Memory [27.651921957220004]
We introduce a novel data-free federated class incremental learning framework with diffusion-based generative memory (DFedDGM) We design a new balanced sampler to help train the diffusion models to alleviate the common non-IID problem in FL. We also introduce an entropy-based sample filtering technique from an information theory perspective to enhance the quality of generative samples.
arXiv Detail & Related papers (2024-05-22T20:59:18Z)
Navigating Heterogeneity and Privacy in One-Shot Federated Learning with Diffusion Models [6.921070916461661]
Federated learning (FL) enables multiple clients to train models collectively while preserving data privacy. One-shot federated learning has emerged as a solution by reducing communication rounds, improving efficiency, and providing better security against eavesdropping attacks.
arXiv Detail & Related papers (2024-05-02T17:26:52Z)
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI) In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion) Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z)
LLDiffusion: Learning Degradation Representations in Diffusion Models for Low-Light Image Enhancement [118.83316133601319]
Current deep learning methods for low-light image enhancement (LLIE) typically rely on pixel-wise mapping learned from paired data. We propose a degradation-aware learning scheme for LLIE using diffusion models, which effectively integrates degradation and image priors into the diffusion process.
arXiv Detail & Related papers (2023-07-27T07:22:51Z)
Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration. We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z)
Training Diffusion Models with Reinforcement Learning [82.29328477109826]
Diffusion models are trained with an approximation to the log-likelihood objective. In this paper, we investigate reinforcement learning methods for directly optimizing diffusion models for downstream objectives. We describe how posing denoising as a multi-step decision-making problem enables a class of policy gradient algorithms.
arXiv Detail & Related papers (2023-05-22T17:57:41Z)
Learning Enhancement From Degradation: A Diffusion Model For Fundus Image Enhancement [21.91300560770087]
We introduce a novel diffusion model based framework, named Learning Enhancement from Degradation (LED) LED learns degradation mappings from unpaired high-quality to low-quality images. LED is able to output enhancement results that maintain clinically important features with better clarity.
arXiv Detail & Related papers (2023-03-08T14:14:49Z)
AnycostFL: Efficient On-Demand Federated Learning over Heterogeneous Edge Devices [20.52519915112099]
We propose a cost-adjustable FL framework, named AnycostFL, that enables diverse edge devices to efficiently perform local updates. Experiment results indicate that, our learning framework can reduce up to 1.9 times of the training latency and energy consumption for realizing a reasonable global testing accuracy.
arXiv Detail & Related papers (2023-01-08T15:25:55Z)
Multi-View Attention Transfer for Efficient Speech Enhancement [1.6932706284468382]
We propose multi-view attention transfer (MV-AT), a feature-based distillation, to obtain efficient speech enhancement models in the time domain. Based on the multi-view features extraction model, MV-AT transfers multi-view knowledge of the teacher network to the student network without additional parameters.
arXiv Detail & Related papers (2022-08-22T14:47:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.