Diversity-Rewarded CFG Distillation
- URL: http://arxiv.org/abs/2410.06084v1
- Date: Tue, 8 Oct 2024 14:40:51 GMT
- Title: Diversity-Rewarded CFG Distillation
- Authors: Geoffrey Cideron, Andrea Agostinelli, Johan Ferret, Sertan Girgin, Romuald Elie, Olivier Bachem, Sarah Perrin, Alexandre Ramé,
- Abstract summary: We introduce diversity-rewarded CFG distillation, a novel finetuning procedure that distills the strengths of CFG while addressing its limitations.
Our approach optimises two training objectives: (1) a distillation objective, encouraging the model alone (without CFG) to imitate the CFG-augmented predictions, and (2) an RL objective with a diversity reward, promoting the generation of diverse outputs for a given prompt.
- Score: 62.08448835625036
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Generative models are transforming creative domains such as music generation, with inference-time strategies like Classifier-Free Guidance (CFG) playing a crucial role. However, CFG doubles inference cost while limiting originality and diversity across generated contents. In this paper, we introduce diversity-rewarded CFG distillation, a novel finetuning procedure that distills the strengths of CFG while addressing its limitations. Our approach optimises two training objectives: (1) a distillation objective, encouraging the model alone (without CFG) to imitate the CFG-augmented predictions, and (2) an RL objective with a diversity reward, promoting the generation of diverse outputs for a given prompt. By finetuning, we learn model weights with the ability to generate high-quality and diverse outputs, without any inference overhead. This also unlocks the potential of weight-based model merging strategies: by interpolating between the weights of two models (the first focusing on quality, the second on diversity), we can control the quality-diversity trade-off at deployment time, and even further boost performance. We conduct extensive experiments on the MusicLM (Agostinelli et al., 2023) text-to-music generative model, where our approach surpasses CFG in terms of quality-diversity Pareto optimality. According to human evaluators, our finetuned-then-merged model generates samples with higher quality-diversity than the base model augmented with CFG. Explore our generations at https://google-research.github.io/seanet/musiclm/diverse_music/.
Related papers
- Elucidating Optimal Reward-Diversity Tradeoffs in Text-to-Image Diffusion Models [20.70550870149442]
We introduce Annealed Importance Guidance (AIG), an inference-time regularization inspired by Annealed Importance Sampling.
Our experiments demonstrate the benefits of AIG for Stable Diffusion models, striking the optimal balance between reward optimization and image diversity.
arXiv Detail & Related papers (2024-09-09T16:27:26Z) - Controlling the Fidelity and Diversity of Deep Generative Models via Pseudo Density [70.14884528360199]
We introduce an approach to bias deep generative models, such as GANs and diffusion models, towards generating data with enhanced fidelity or increased diversity.
Our approach involves manipulating the distribution of training and generated data through a novel metric for individual samples, named pseudo density.
arXiv Detail & Related papers (2024-07-11T16:46:04Z) - CoinSeg: Contrast Inter- and Intra- Class Representations for
Incremental Segmentation [85.13209973293229]
Class incremental semantic segmentation aims to strike a balance between the model's stability and plasticity.
We propose Contrast inter- and intra-class representations for Incremental (CoinSeg)
arXiv Detail & Related papers (2023-10-10T07:08:49Z) - A Bayesian Non-parametric Approach to Generative Models: Integrating
Variational Autoencoder and Generative Adversarial Networks using Wasserstein
and Maximum Mean Discrepancy [2.966338139852619]
Generative adversarial networks (GANs) and variational autoencoders (VAEs) are two of the most prominent and widely studied generative models.
We employ a Bayesian non-parametric (BNP) approach to merge GANs and VAEs.
By fusing the discriminative power of GANs with the reconstruction capabilities of VAEs, our novel model achieves superior performance in various generative tasks.
arXiv Detail & Related papers (2023-08-27T08:58:31Z) - VOLTA: Improving Generative Diversity by Variational Mutual Information Maximizing Autoencoder [38.35049378875308]
We introduce VOLTA, a framework that elevates generative diversity by bridging Transformer with VAE.
We perform comprehensive experiments with two types of Transformers on six datasets to show that our approach can significantly improve generative diversity while maintaining generative quality.
arXiv Detail & Related papers (2023-07-03T08:45:42Z) - Stay on topic with Classifier-Free Guidance [57.28934343207042]
We show that CFG can be used broadly as an inference-time technique in pure language modeling.
We show that CFG improves the performance of Pythia, GPT-2 and LLaMA-family models across an array of tasks.
arXiv Detail & Related papers (2023-06-30T17:07:02Z) - A Closer Look at Few-shot Image Generation [38.83570296616384]
When transferring pretrained GANs on small target data, the generator tends to replicate the training samples.
Several methods have been proposed to address this few-shot image generation, but there is a lack of effort to analyze them under a unified framework.
We propose a framework to analyze existing methods during the adaptation.
Second contribution proposes to apply mutual information (MI) to retain the source domain's rich multi-level diversity information in the target domain generator.
arXiv Detail & Related papers (2022-05-08T07:46:26Z) - One-Shot Adaptation of GAN in Just One CLIP [51.188396199083336]
We present a novel single-shot GAN adaptation method through unified CLIP space manipulations.
Specifically, our model employs a two-step training strategy: reference image search in the source generator using a CLIP-guided latent optimization.
We show that our model generates diverse outputs with the target texture and outperforms the baseline models both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-03-17T13:03:06Z) - AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering.
The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch.
The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level.
The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.