KIND: Knowledge Integration and Diversion in Diffusion Models
- URL: http://arxiv.org/abs/2408.07337v1
- Date: Wed, 14 Aug 2024 07:22:28 GMT
- Title: KIND: Knowledge Integration and Diversion in Diffusion Models
- Authors: Yucheng Xie, Fu Feng, Jing Wang, Xin Geng, Yong Rui,
- Abstract summary: We introduce textbfKIND, which performs textbfKnowledge textbfINtegration and textbfDiversion in diffusion models.
KIND redefines traditional pre-training methods by adjusting training objectives from maximizing model performance on current tasks to condensing transferable common knowledge.
Results indicate that KIND achieves state-of-the-art performance compared to other PEFT and learngene methods.
- Score: 40.442303050947395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained models have become the preferred backbone due to the expansion of model parameters, with techniques like Parameter-Efficient Fine-Tuning (PEFTs) typically fixing the parameters of these models. However, pre-trained models may not always be optimal, especially when there are discrepancies between training tasks and target tasks, potentially resulting in negative transfer. To address this, we introduce \textbf{KIND}, which performs \textbf{K}nowledge \textbf{IN}tegration and \textbf{D}iversion in diffusion models. KIND first integrates knowledge by decomposing parameter matrices of models using $U$, $\Sigma$, and $V$ matrices, formally inspired by singular value decomposition (SVD). Then it explicitly partitions the components of these matrices into \textbf{learngenes} and \textbf{tailors} to condense common and class-specific knowledge, respectively, through a class gate. In this way, KIND redefines traditional pre-training methods by adjusting training objectives from maximizing model performance on current tasks to condensing transferable common knowledge, leveraging the \textit{Learngene} framework. We conduct experiments on ImageNet-1K and compare KIND with PEFT and other learngene methods. Results indicate that KIND achieves state-of-the-art performance compared to other PEFT and learngene methods. Specifically, the images generated by KIND achieves more than 6.54 and 1.07 decrease in FID and sFID on DiT-L/2, utilizing only 45.4M trainable parameters and saving at least 35.4G FLOPs in computational cost.
Related papers
- Over-parameterized Student Model via Tensor Decomposition Boosted Knowledge Distillation [10.48108719012248]
We focus on Knowledge Distillation (KD), where a compact student model is trained to mimic a larger teacher model.
In contrast to much of the previous work, we scale up the parameters of the student model during training.
arXiv Detail & Related papers (2024-11-10T12:40:59Z) - FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models [35.40065954148091]
FINE is a method based on the Learngene framework to initializing downstream networks leveraging pre-trained models.
It decomposes pre-trained knowledge into the product of matrices (i.e., $U$, $Sigma$, and $V$), where $U$ and $V$ are shared across network blocks as learngenes''
It consistently outperforms direct pre-training, particularly for smaller models, achieving state-of-the-art results across variable model sizes.
arXiv Detail & Related papers (2024-09-28T08:57:17Z) - Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications.
Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space.
We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z) - Diffusion Model Patching via Mixture-of-Prompts [17.04227271007777]
Diffusion Model Patching (DMP) is a simple method to boost the performance of pre-trained diffusion models.
DMP inserts a small, learnable set of prompts into the model's input space while keeping the original model frozen.
arXiv Detail & Related papers (2024-05-28T04:47:54Z) - StochCA: A Novel Approach for Exploiting Pretrained Models with Cross-Attention [2.66269503676104]
We introduce a novel fine-tuning method, called cross-attention (StochCA), specific to Transformer architectures.
This method modifies the Transformer's self-attention mechanism to selectively utilize knowledge from pretrained models during fine-tuning.
Our experimental results show the superiority of StochCA over state-of-the-art approaches in both areas.
arXiv Detail & Related papers (2024-02-25T13:53:49Z) - Efficient GPT Model Pre-training using Tensor Train Matrix
Representation [65.96485282393361]
Large-scale transformer models feature billions of parameters, leading to difficulties in their deployment and prohibitive training costs from scratch.
To reduce the number of parameters in the GPT-2 architecture, we replace the matrices of fully-connected layers with the corresponding Train Matrix(TTM) structure.
The resulting GPT-based model stores up to 40% fewer parameters, showing the perplexity comparable to the original model.
arXiv Detail & Related papers (2023-06-05T08:38:25Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
Adaptation [68.30497162547768]
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
arXiv Detail & Related papers (2022-04-15T23:19:37Z) - DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language
Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive.
We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.
Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.