Related papers: RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance

RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance

URL: http://arxiv.org/abs/2408.17095v2
Date: Mon, 2 Sep 2024 20:33:49 GMT
Title: RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance
Authors: Avideep Mukherjee, Soumya Banerjee, Piyush Rai, Vinay P. Namboodiri,
Abstract summary: Block-wise generation can be a promising alternative for designing compact-sized deep generative models. We propose a retrieval-augmented generation (RAG) approach to condition the training and generation stages of a block-wise denoising diffusion model. Our conditioning schemes ensure coherence across the different blocks during training and, consequently, during generation.
Score: 34.893261410589396
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion-based models demonstrate impressive generation capabilities. However, they also have a massive number of parameters, resulting in enormous model sizes, thus making them unsuitable for deployment on resource-constraint devices. Block-wise generation can be a promising alternative for designing compact-sized (parameter-efficient) deep generative models since the model can generate one block at a time instead of generating the whole image at once. However, block-wise generation is also considerably challenging because ensuring coherence across generated blocks can be non-trivial. To this end, we design a retrieval-augmented generation (RAG) approach and leverage the corresponding blocks of the images retrieved by the RAG module to condition the training and generation stages of a block-wise denoising diffusion model. Our conditioning schemes ensure coherence across the different blocks during training and, consequently, during generation. While we showcase our approach using the latent diffusion model (LDM) as the base model, it can be used with other variants of denoising diffusion models. We validate the solution of the coherence problem through the proposed approach by reporting substantive experiments to demonstrate our approach's effectiveness in compact model size and excellent generation quality.

Related papers

Hyperspectral Image Generation with Unmixing Guided Diffusion Model [21.078386095749398]
Diffusion models are popular for their ability to generate high-quality samples.<n>We propose a novel diffusion model guided by hyperspectral unmixing.<n>Our model is capable of generating high-quality and diverse hyperspectral images.
arXiv Detail & Related papers (2025-06-03T08:27:10Z)
CtrlDiff: Boosting Large Diffusion Language Models with Dynamic Block Prediction and Controllable Generation [7.250878248686215]
Diffusion-based language models have emerged as a compelling alternative due to their powerful parallel generation capabilities and inherent editability.<n>We propose CtrlDiff, a dynamic and controllable semi-autoregressive framework that adaptively determines the size of each generation block based on local semantics.
arXiv Detail & Related papers (2025-05-20T14:52:41Z)
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models [15.853201399662344]
Diffusion language models offer unique benefits over autoregressive models. They lag in likelihood modeling and are limited to fixed-length generation. We introduce a class of block diffusion language models that interpolate between discrete denoising diffusion and autoregressive models.
arXiv Detail & Related papers (2025-03-12T17:43:40Z)
Training-free Heterogeneous Model Merging [40.681362819808136]
We propose an innovative model merging framework designed for heterogeneous models. We show that the merging of structurally heterogeneous models can achieve performance levels comparable to those of homogeneous merging. Our code is publicly available at https://github.com/zju-vipa/training_free_heterogeneous_model_merging.
arXiv Detail & Related papers (2024-12-29T04:49:11Z)
Energy-Based Diffusion Language Models for Text Generation [126.23425882687195]
Energy-based Diffusion Language Model (EDLM) is an energy-based model operating at the full sequence level for each diffusion step. Our framework offers a 1.3$times$ sampling speedup over existing diffusion models.
arXiv Detail & Related papers (2024-10-28T17:25:56Z)
Towards Model-Agnostic Dataset Condensation by Heterogeneous Models [13.170099297210372]
We develop a novel method to produce universally applicable condensed images through cross-model interactions. By balancing the contribution of each model and maintaining their semantic meaning closely, our approach overcomes the limitations associated with model-specific condensed images.
arXiv Detail & Related papers (2024-09-22T17:13:07Z)
Inverse design with conditional cascaded diffusion models [0.0]
Adjoint-based design optimizations are usually computationally expensive and those costs scale with resolution. We extend the use of diffusion models over traditional generative models by proposing the conditional cascaded diffusion model (cCDM) Our study compares cCDM against a cGAN model with transfer learning. While both models show decreased performance with reduced high-resolution training data, the cCDM loses its superiority to the cGAN model with transfer learning when training data is limited.
arXiv Detail & Related papers (2024-08-16T04:54:09Z)
Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding [84.3224556294803]
Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences. We aim to optimize downstream reward functions while preserving the naturalness of these design spaces. Our algorithm integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future.
arXiv Detail & Related papers (2024-08-15T16:47:59Z)
Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models [54.132297393662654]
We introduce a hybrid method that fine-tunes cutting-edge diffusion models by optimizing reward models through RL. We demonstrate the capability of our approach to outperform the best designs in offline data, leveraging the extrapolation capabilities of reward models.
arXiv Detail & Related papers (2024-05-30T03:57:29Z)
A Bayesian Non-parametric Approach to Generative Models: Integrating Variational Autoencoder and Generative Adversarial Networks using Wasserstein and Maximum Mean Discrepancy [2.966338139852619]
Generative adversarial networks (GANs) and variational autoencoders (VAEs) are two of the most prominent and widely studied generative models. We employ a Bayesian non-parametric (BNP) approach to merge GANs and VAEs. By fusing the discriminative power of GANs with the reconstruction capabilities of VAEs, our novel model achieves superior performance in various generative tasks.
arXiv Detail & Related papers (2023-08-27T08:58:31Z)
Semi-Implicit Denoising Diffusion Models (SIDDMs) [50.30163684539586]
Existing models such as Denoising Diffusion Probabilistic Models (DDPM) deliver high-quality, diverse samples but are slowed by an inherently high number of iterative steps. We introduce a novel approach that tackles the problem by matching implicit and explicit factors. We demonstrate that our proposed method obtains comparable generative performance to diffusion-based models and vastly superior results to models with a small number of sampling steps.
arXiv Detail & Related papers (2023-06-21T18:49:22Z)
Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance. We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring. Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z)
Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One [83.5162421521224]
We propose a unique method termed E-ARM for training autoregressive generative models. E-ARM takes advantage of a well-designed energy-based learning objective. We show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem.
arXiv Detail & Related papers (2022-06-26T10:58:41Z)
Learning High-Dimensional Distributions with Latent Neural Fokker-Planck Kernels [67.81799703916563]
We introduce new techniques to formulate the problem as solving Fokker-Planck equation in a lower-dimensional latent space. Our proposed model consists of latent-distribution morphing, a generator and a parameterized Fokker-Planck kernel function.
arXiv Detail & Related papers (2021-05-10T17:42:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.