Related papers: Denoising Task Routing for Diffusion Models

Denoising Task Routing for Diffusion Models

URL: http://arxiv.org/abs/2310.07138v3
Date: Wed, 21 Feb 2024 04:00:06 GMT
Title: Denoising Task Routing for Diffusion Models
Authors: Byeongjun Park, Sangmin Woo, Hyojun Go, Jin-Young Kim, Changick Kim
Abstract summary: Diffusion models generate highly realistic images by learning a multi-step denoising process. Despite the inherent connection between diffusion models and multi-task learning (MTL), there remains an unexplored area in designing neural architectures. We present Denoising Task Routing (DTR), a simple add-on strategy for existing diffusion model architectures to establish distinct information pathways.
Score: 19.373733104929325
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion models generate highly realistic images by learning a multi-step denoising process, naturally embodying the principles of multi-task learning (MTL). Despite the inherent connection between diffusion models and MTL, there remains an unexplored area in designing neural architectures that explicitly incorporate MTL into the framework of diffusion models. In this paper, we present Denoising Task Routing (DTR), a simple add-on strategy for existing diffusion model architectures to establish distinct information pathways for individual tasks within a single architecture by selectively activating subsets of channels in the model. What makes DTR particularly compelling is its seamless integration of prior knowledge of denoising tasks into the framework: (1) Task Affinity: DTR activates similar channels for tasks at adjacent timesteps and shifts activated channels as sliding windows through timesteps, capitalizing on the inherent strong affinity between tasks at adjacent timesteps. (2) Task Weights: During the early stages (higher timesteps) of the denoising process, DTR assigns a greater number of task-specific channels, leveraging the insight that diffusion models prioritize reconstructing global structure and perceptually rich contents in earlier stages, and focus on simple noise removal in later stages. Our experiments reveal that DTR not only consistently boosts diffusion models' performance across different evaluation protocols without adding extra parameters but also accelerates training convergence. Finally, we show the complementarity between our architectural approach and existing MTL optimization techniques, providing a more complete view of MTL in the context of diffusion training. Significantly, by leveraging this complementarity, we attain matched performance of DiT-XL using the smaller DiT-L with a reduction in training iterations from 7M to 2M.

Related papers

MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings [75.0617088717528]
MoCa is a framework for transforming pre-trained VLM backbones into effective bidirectional embedding models.<n>MoCa consistently improves performance across MMEB and ViDoRe-v2 benchmarks, achieving new state-of-the-art results.
arXiv Detail & Related papers (2025-06-29T06:41:00Z)
FreSca: Scaling in Frequency Space Enhances Diffusion Models [55.75504192166779]
This paper explores frequency-based control within latent diffusion models.<n>We introduce FreSca, a novel framework that decomposes noise difference into low- and high-frequency components.<n>FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control.
arXiv Detail & Related papers (2025-04-02T22:03:11Z)
Linear Diffusion Networks [0.0]
We present Linear Diffusion Networks (LDNs), a novel architecture that reinterprets sequential data processing as a unified diffusion process. Our model integrates adaptive diffusion modules with localized nonlinear updates and a diffusion-inspired attention mechanism. Experiments on benchmark sequence modeling tasks demonstrate that LDN delivers competitive performance across ImageNet and LRA tasks.
arXiv Detail & Related papers (2025-02-17T23:40:27Z)
Do We Need to Design Specific Diffusion Models for Different Tasks? Try ONE-PIC [77.8851460746251]
We propose a simple, efficient, and general approach to fine-tune diffusion models. ONE-PIC enhances the inherited generative ability in the pretrained diffusion models without introducing additional modules. Our method is simple and efficient which streamlines the adaptation process and achieves excellent performance with lower costs.
arXiv Detail & Related papers (2024-12-07T11:19:32Z)
DINTR: Tracking via Diffusion-based Interpolation [12.130669304428565]
This work proposes a novel diffusion-based methodology to formulate the tracking task. Our INterpolation TrackeR (DINTR) presents a promising new paradigm and achieves a superior multiplicity on seven benchmarks across five indicator representations.
arXiv Detail & Related papers (2024-10-14T00:41:58Z)
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think [72.48325960659822]
One main bottleneck in training large-scale diffusion models for generation lies in effectively learning these representations. We study this by introducing a straightforward regularization called REPresentation Alignment (REPA), which aligns the projections of noisy input hidden states in denoising networks with clean image representations obtained from external, pretrained visual encoders. The results are striking: our simple strategy yields significant improvements in both training efficiency and generation quality when applied to popular diffusion and flow-based transformers, such as DiTs and SiTs.
arXiv Detail & Related papers (2024-10-09T14:34:53Z)
Binarized Diffusion Model for Image Super-Resolution [61.963833405167875]
Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating advanced diffusion models (DMs) Existing binarization methods result in significant performance degradation. We introduce a novel binarized diffusion model, BI-DiffSR, for image SR.
arXiv Detail & Related papers (2024-06-09T10:30:25Z)
Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning [50.73666458313015]
Large Language Models (LLMs) have demonstrated significant potential in performing multiple tasks in multimedia applications. MoE has been emerged as a promising solution with its sparse architecture for effective task decoupling. Intuition-MoR1E achieves superior efficiency and 2.15% overall accuracy improvement across 14 public datasets.
arXiv Detail & Related papers (2024-04-13T12:14:58Z)
Improving Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architectures [12.703947839247693]
Diffusion models, emerging as powerful deep generative tools, excel in various applications. However, their remarkable generative performance is hindered by slow training and sampling. This is due to the necessity of tracking extensive forward and reverse diffusion trajectories. We present a multi-stage framework inspired by our empirical findings to tackle these challenges.
arXiv Detail & Related papers (2023-12-14T17:48:09Z)
Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks [55.36987468073152]
This paper proposes a novel Dual-Guided Spatial-Channel-Temporal (DG-SCT) attention mechanism. The DG-SCT module incorporates trainable cross-modal interaction layers into pre-trained audio-visual encoders. Our proposed model achieves state-of-the-art results across multiple downstream tasks, including AVE, AVVP, AVS, and AVQA.
arXiv Detail & Related papers (2023-11-09T05:24:20Z)
Mutual Information-driven Triple Interaction Network for Efficient Image Dehazing [54.168567276280505]
We propose a novel Mutual Information-driven Triple interaction Network (MITNet) for image dehazing. The first stage, named amplitude-guided haze removal, aims to recover the amplitude spectrum of the hazy images for haze removal. The second stage, named phase-guided structure refined, devotes to learning the transformation and refinement of the phase spectrum.
arXiv Detail & Related papers (2023-08-14T08:23:58Z)
Addressing Negative Transfer in Diffusion Models [25.457422246404853]
Multi-task learning (MTL) can lead to negative transfer in diffusion models. We propose clustering the denoising tasks into small task clusters and applying MTL methods to them. We show that interval clustering can be solved using dynamic programming, utilizing signal-to-noise ratio, timestep, and task affinity for clustering objectives.
arXiv Detail & Related papers (2023-06-01T05:17:07Z)
Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis. For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z)
A Dirichlet Process Mixture of Robust Task Models for Scalable Lifelong Reinforcement Learning [11.076005074172516]
reinforcement learning algorithms can easily encounter catastrophic forgetting or interference when faced with lifelong streaming information. We propose a scalable lifelong RL method that dynamically expands the network capacity to accommodate new knowledge. We show that our method successfully facilitates scalable lifelong RL and outperforms relevant existing methods.
arXiv Detail & Related papers (2022-05-22T09:48:41Z)
Coarse-to-Fine Video Denoising with Dual-Stage Spatial-Channel Transformer [29.03463312813923]
Video denoising aims to recover high-quality frames from the noisy video. Most existing approaches adopt convolutional neural networks(CNNs) to separate the noise from the original visual content. We propose a Dual-stage Spatial-Channel Transformer (DSCT) for coarse-to-fine video denoising.
arXiv Detail & Related papers (2022-04-30T09:01:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.