Denoising Task Routing for Diffusion Models
- URL: http://arxiv.org/abs/2310.07138v3
- Date: Wed, 21 Feb 2024 04:00:06 GMT
- Title: Denoising Task Routing for Diffusion Models
- Authors: Byeongjun Park, Sangmin Woo, Hyojun Go, Jin-Young Kim, Changick Kim
- Abstract summary: Diffusion models generate highly realistic images by learning a multi-step denoising process.
Despite the inherent connection between diffusion models and multi-task learning (MTL), there remains an unexplored area in designing neural architectures.
We present Denoising Task Routing (DTR), a simple add-on strategy for existing diffusion model architectures to establish distinct information pathways.
- Score: 19.373733104929325
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models generate highly realistic images by learning a multi-step
denoising process, naturally embodying the principles of multi-task learning
(MTL). Despite the inherent connection between diffusion models and MTL, there
remains an unexplored area in designing neural architectures that explicitly
incorporate MTL into the framework of diffusion models. In this paper, we
present Denoising Task Routing (DTR), a simple add-on strategy for existing
diffusion model architectures to establish distinct information pathways for
individual tasks within a single architecture by selectively activating subsets
of channels in the model. What makes DTR particularly compelling is its
seamless integration of prior knowledge of denoising tasks into the framework:
(1) Task Affinity: DTR activates similar channels for tasks at adjacent
timesteps and shifts activated channels as sliding windows through timesteps,
capitalizing on the inherent strong affinity between tasks at adjacent
timesteps. (2) Task Weights: During the early stages (higher timesteps) of the
denoising process, DTR assigns a greater number of task-specific channels,
leveraging the insight that diffusion models prioritize reconstructing global
structure and perceptually rich contents in earlier stages, and focus on simple
noise removal in later stages. Our experiments reveal that DTR not only
consistently boosts diffusion models' performance across different evaluation
protocols without adding extra parameters but also accelerates training
convergence. Finally, we show the complementarity between our architectural
approach and existing MTL optimization techniques, providing a more complete
view of MTL in the context of diffusion training. Significantly, by leveraging
this complementarity, we attain matched performance of DiT-XL using the smaller
DiT-L with a reduction in training iterations from 7M to 2M.
Related papers
- DINTR: Tracking via Diffusion-based Interpolation [12.130669304428565]
This work proposes a novel diffusion-based methodology to formulate the tracking task.
Our INterpolation TrackeR (DINTR) presents a promising new paradigm and achieves a superior multiplicity on seven benchmarks across five indicator representations.
arXiv Detail & Related papers (2024-10-14T00:41:58Z) - Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think [72.48325960659822]
One main bottleneck in training large-scale diffusion models for generation lies in effectively learning these representations.
We study this by introducing a straightforward regularization called REPresentation Alignment (REPA), which aligns the projections of noisy input hidden states in denoising networks with clean image representations obtained from external, pretrained visual encoders.
The results are striking: our simple strategy yields significant improvements in both training efficiency and generation quality when applied to popular diffusion and flow-based transformers, such as DiTs and SiTs.
arXiv Detail & Related papers (2024-10-09T14:34:53Z) - Binarized Diffusion Model for Image Super-Resolution [61.963833405167875]
Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating advanced diffusion models (DMs)
Existing binarization methods result in significant performance degradation.
We introduce a novel binarized diffusion model, BI-DiffSR, for image SR.
arXiv Detail & Related papers (2024-06-09T10:30:25Z) - Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning [50.73666458313015]
Large Language Models (LLMs) have demonstrated significant potential in performing multiple tasks in multimedia applications.
MoE has been emerged as a promising solution with its sparse architecture for effective task decoupling.
Intuition-MoR1E achieves superior efficiency and 2.15% overall accuracy improvement across 14 public datasets.
arXiv Detail & Related papers (2024-04-13T12:14:58Z) - Improving Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architectures [12.703947839247693]
Diffusion models, emerging as powerful deep generative tools, excel in various applications.
However, their remarkable generative performance is hindered by slow training and sampling.
This is due to the necessity of tracking extensive forward and reverse diffusion trajectories.
We present a multi-stage framework inspired by our empirical findings to tackle these challenges.
arXiv Detail & Related papers (2023-12-14T17:48:09Z) - Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual
Downstream Tasks [55.36987468073152]
This paper proposes a novel Dual-Guided Spatial-Channel-Temporal (DG-SCT) attention mechanism.
The DG-SCT module incorporates trainable cross-modal interaction layers into pre-trained audio-visual encoders.
Our proposed model achieves state-of-the-art results across multiple downstream tasks, including AVE, AVVP, AVS, and AVQA.
arXiv Detail & Related papers (2023-11-09T05:24:20Z) - Mutual Information-driven Triple Interaction Network for Efficient Image
Dehazing [54.168567276280505]
We propose a novel Mutual Information-driven Triple interaction Network (MITNet) for image dehazing.
The first stage, named amplitude-guided haze removal, aims to recover the amplitude spectrum of the hazy images for haze removal.
The second stage, named phase-guided structure refined, devotes to learning the transformation and refinement of the phase spectrum.
arXiv Detail & Related papers (2023-08-14T08:23:58Z) - Addressing Negative Transfer in Diffusion Models [25.457422246404853]
Multi-task learning (MTL) can lead to negative transfer in diffusion models.
We propose clustering the denoising tasks into small task clusters and applying MTL methods to them.
We show that interval clustering can be solved using dynamic programming, utilizing signal-to-noise ratio, timestep, and task affinity for clustering objectives.
arXiv Detail & Related papers (2023-06-01T05:17:07Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - A Dirichlet Process Mixture of Robust Task Models for Scalable Lifelong
Reinforcement Learning [11.076005074172516]
reinforcement learning algorithms can easily encounter catastrophic forgetting or interference when faced with lifelong streaming information.
We propose a scalable lifelong RL method that dynamically expands the network capacity to accommodate new knowledge.
We show that our method successfully facilitates scalable lifelong RL and outperforms relevant existing methods.
arXiv Detail & Related papers (2022-05-22T09:48:41Z) - Coarse-to-Fine Video Denoising with Dual-Stage Spatial-Channel
Transformer [29.03463312813923]
Video denoising aims to recover high-quality frames from the noisy video.
Most existing approaches adopt convolutional neural networks(CNNs) to separate the noise from the original visual content.
We propose a Dual-stage Spatial-Channel Transformer (DSCT) for coarse-to-fine video denoising.
arXiv Detail & Related papers (2022-04-30T09:01:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.