Lightweight Diffusion Models with Distillation-Based Block Neural
Architecture Search
- URL: http://arxiv.org/abs/2311.04950v2
- Date: Wed, 15 Nov 2023 07:37:28 GMT
- Title: Lightweight Diffusion Models with Distillation-Based Block Neural
Architecture Search
- Authors: Siao Tang, Xin Wang, Hong Chen, Chaoyu Guan, Yansong Tang, Wenwu zhu
- Abstract summary: We propose to automatically remove structural redundancy in diffusion models with our proposed Diffusion Distillation-based Block-wise Neural Architecture Search (NAS)
Given a larger pretrained teacher, we leverage DiffNAS to search for the smallest architecture which can achieve on-par or even better performance than the teacher.
Different from previous block-wise NAS methods, DiffNAS contains a block-wise local search strategy and a retraining strategy with a joint dynamic loss.
- Score: 55.41583104734349
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have recently shown remarkable generation ability, achieving
state-of-the-art performance in many tasks. However, the high computational
cost is still a troubling problem for diffusion models. To tackle this problem,
we propose to automatically remove the structural redundancy in diffusion
models with our proposed Diffusion Distillation-based Block-wise Neural
Architecture Search (DiffNAS). Specifically, given a larger pretrained teacher,
we leverage DiffNAS to search for the smallest architecture which can achieve
on-par or even better performance than the teacher. Considering current
diffusion models are based on UNet which naturally has a block-wise structure,
we perform neural architecture search independently in each block, which
largely reduces the search space. Different from previous block-wise NAS
methods, DiffNAS contains a block-wise local search strategy and a retraining
strategy with a joint dynamic loss. Concretely, during the search process, we
block-wisely select the best subnet to avoid the unfairness brought by the
global search strategy used in previous works. When retraining the searched
architecture, we adopt a dynamic joint loss to maintain the consistency between
supernet training and subnet retraining, which also provides informative
objectives for each block and shortens the paths of gradient propagation. We
demonstrate this joint loss can effectively improve model performance. We also
prove the necessity of the dynamic adjustment of this loss. The experiments
show that our method can achieve significant computational reduction,
especially on latent diffusion models with about 50\% MACs and Parameter
reduction.
Related papers
- Adv-KD: Adversarial Knowledge Distillation for Faster Diffusion Sampling [2.91204440475204]
Diffusion Probabilistic Models (DPMs) have emerged as a powerful class of deep generative models.
They rely on sequential denoising steps during sample generation.
We propose a novel method that integrates denoising phases directly into the model's architecture.
arXiv Detail & Related papers (2024-05-31T08:19:44Z) - Depth-agnostic Single Image Dehazing [12.51359372069387]
We propose a simple yet novel synthetic method to decouple the relationship between haze density and scene depth, by which a depth-agnostic dataset (DA-HAZE) is generated.
Experiments indicate that models trained on DA-HAZE achieve significant improvements on real-world benchmarks, with less discrepancy between SOTS and DA-SOTS.
We revisit the U-Net-based architectures for dehazing, in which dedicatedly designed blocks are incorporated.
arXiv Detail & Related papers (2024-01-14T06:33:11Z) - MGAS: Multi-Granularity Architecture Search for Trade-Off Between Model
Effectiveness and Efficiency [10.641875933652647]
We introduce multi-granularity architecture search (MGAS) to discover both effective and efficient neural networks.
We learn discretization functions specific to each granularity level to adaptively determine the unit remaining ratio according to the evolving architecture.
Extensive experiments on CIFAR-10, CIFAR-100 and ImageNet demonstrate that MGAS outperforms other state-of-the-art methods in achieving a better trade-off between model performance and model size.
arXiv Detail & Related papers (2023-10-23T16:32:18Z) - Phasic Content Fusing Diffusion Model with Directional Distribution
Consistency for Few-Shot Model Adaption [73.98706049140098]
We propose a novel phasic content fusing few-shot diffusion model with directional distribution consistency loss.
Specifically, we design a phasic training strategy with phasic content fusion to help our model learn content and style information when t is large.
Finally, we propose a cross-domain structure guidance strategy that enhances structure consistency during domain adaptation.
arXiv Detail & Related papers (2023-09-07T14:14:11Z) - FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories.
We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z) - $\beta$-DARTS: Beta-Decay Regularization for Differentiable Architecture
Search [85.84110365657455]
We propose a simple-but-efficient regularization method, termed as Beta-Decay, to regularize the DARTS-based NAS searching process.
Experimental results on NAS-Bench-201 show that our proposed method can help to stabilize the searching process and makes the searched network more transferable across different datasets.
arXiv Detail & Related papers (2022-03-03T11:47:14Z) - D-DARTS: Distributed Differentiable Architecture Search [75.12821786565318]
Differentiable ARchiTecture Search (DARTS) is one of the most trending Neural Architecture Search (NAS) methods.
We propose D-DARTS, a novel solution that addresses this problem by nesting several neural networks at cell-level.
arXiv Detail & Related papers (2021-08-20T09:07:01Z) - SpaceNet: Make Free Space For Continual Learning [15.914199054779438]
We propose a novel architectural-based method referred as SpaceNet for class incremental learning scenario.
SpaceNet trains sparse deep neural networks from scratch in an adaptive way that compresses the sparse connections of each task in a compact number of neurons.
Experimental results show the robustness of our proposed method against catastrophic forgetting old tasks and the efficiency of SpaceNet in utilizing the available capacity of the model.
arXiv Detail & Related papers (2020-07-15T11:21:31Z) - DrNAS: Dirichlet Neural Architecture Search [88.56953713817545]
We treat the continuously relaxed architecture mixing weight as random variables, modeled by Dirichlet distribution.
With recently developed pathwise derivatives, the Dirichlet parameters can be easily optimized with gradient-based generalization.
To alleviate the large memory consumption of differentiable NAS, we propose a simple yet effective progressive learning scheme.
arXiv Detail & Related papers (2020-06-18T08:23:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.