Hierarchical Koopman Diffusion: Fast Generation with Interpretable Diffusion Trajectory
- URL: http://arxiv.org/abs/2510.12220v1
- Date: Tue, 14 Oct 2025 07:17:35 GMT
- Title: Hierarchical Koopman Diffusion: Fast Generation with Interpretable Diffusion Trajectory
- Authors: Hanru Bai, Weiyang Ding, Difan Zou,
- Abstract summary: textbfHierarchical Koopman Diffusion is a novel framework that achieves both one-step sampling and interpretable generative trajectories.<n>Our framework bridges the gap between fast sampling and interpretability in diffusion models, paving the way for explainable image synthesis in generative modeling.
- Score: 30.327899232038863
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Diffusion models have achieved impressive success in high-fidelity image generation but suffer from slow sampling due to their inherently iterative denoising process. While recent one-step methods accelerate inference by learning direct noise-to-image mappings, they sacrifice the interpretability and fine-grained control intrinsic to diffusion dynamics, key advantages that enable applications like editable generation. To resolve this dichotomy, we introduce \textbf{Hierarchical Koopman Diffusion}, a novel framework that achieves both one-step sampling and interpretable generative trajectories. Grounded in Koopman operator theory, our method lifts the nonlinear diffusion dynamics into a latent space where evolution is governed by globally linear operators, enabling closed-form trajectory solutions. This formulation not only eliminates iterative sampling but also provides full access to intermediate states, allowing manual intervention during generation. To model the multi-scale nature of images, we design a hierarchical architecture that disentangles generative dynamics across spatial resolutions via scale-specific Koopman subspaces, capturing coarse-to-fine details systematically. We empirically show that the Hierarchical Koopman Diffusion not only achieves competitive one-step generation performance but also provides a principled mechanism for interpreting and manipulating the generative process through spectral analysis. Our framework bridges the gap between fast sampling and interpretability in diffusion models, paving the way for explainable image synthesis in generative modeling.
Related papers
- Bridging the Discrete-Continuous Gap: Unified Multimodal Generation via Coupled Manifold Discrete Absorbing Diffusion [60.186310080523135]
Bifurcation of generative modeling into autoregressive approaches for discrete data (text) and diffusion approaches for continuous data (images) hinders development of truly unified multimodal systems.<n>We propose textbfCoM-DAD, a novel probabilistic framework that reformulates multimodal generation as a hierarchical dual-process.<n>Our method demonstrates superior stability over standard masked modeling, establishing a new paradigm for scalable, unified text-image generation.
arXiv Detail & Related papers (2026-01-07T16:21:19Z) - Fitting Image Diffusion Models on Video Datasets [30.688877034764474]
We propose a simple and effective training strategy that leverages the temporal inductive bias present in continuous video frames to improve diffusion training.<n>We evaluate our method on the HandCo dataset, where hand-object interactions exhibit dense temporal coherence.
arXiv Detail & Related papers (2025-09-04T01:04:54Z) - Unfolding Generative Flows with Koopman Operators: Fast and Interpretable Sampling [26.912726794632732]
Conditional Flow Matching (CFM) offers a simulation-free framework for training continuous-time generative models.<n>We propose to accelerate CFM and introduce an interpretable representation of its dynamics by integrating Koopman operator theory.
arXiv Detail & Related papers (2025-06-27T15:16:16Z) - Consistent World Models via Foresight Diffusion [56.45012929930605]
We argue that a key bottleneck in learning consistent diffusion-based world models lies in the suboptimal predictive ability.<n>We propose Foresight Diffusion (ForeDiff), a diffusion-based world modeling framework that enhances consistency by decoupling condition understanding from target denoising.
arXiv Detail & Related papers (2025-05-22T10:01:59Z) - Toward Theoretical Insights into Diffusion Trajectory Distillation via Operator Merging [10.315743300140966]
Diffusion trajectory distillation aims to accelerate sampling in diffusion models that produce high-quality outputs but suffer from slow sampling speeds.<n>We propose a programming algorithm to compute the optimal merging strategy that maximally preserves signal fidelity.<n>Our findings enhance the theoretical understanding of diffusion trajectory distillation and offer practical insights for improving distillation strategies.
arXiv Detail & Related papers (2025-05-21T21:13:02Z) - One-Step Offline Distillation of Diffusion-based Models via Koopman Modeling [26.913398550088477]
We introduce the Koopman Distillation Model (KDM), a novel offline distillation approach grounded in Koopman theory.<n>KDM encodes noisy inputs into an embedded space where a learned linear operator propagates them forward, followed by a decoder that reconstructs clean samples.<n>KDM achieves highly competitive performance across standard offline distillation benchmarks.
arXiv Detail & Related papers (2025-05-19T16:59:47Z) - One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method [60.88467353578118]
We show that a fixed-point-inspired iterative approach to invert real-world images does not achieve convergence, instead oscillating between distinct clusters.
We introduce a simple and fast distribution transfer technique that facilitates image enhancement, stroke-based recoloring, as well as visual prompt-guided image editing.
arXiv Detail & Related papers (2024-11-17T17:45:37Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - A Variational Perspective on Solving Inverse Problems with Diffusion
Models [101.831766524264]
Inverse tasks can be formulated as inferring a posterior distribution over data.
This is however challenging in diffusion models since the nonlinear and iterative nature of the diffusion process renders the posterior intractable.
We propose a variational approach that by design seeks to approximate the true posterior distribution.
arXiv Detail & Related papers (2023-05-07T23:00:47Z) - Enhanced Controllability of Diffusion Models via Feature Disentanglement and Realism-Enhanced Sampling Methods [27.014858633903867]
We present a training framework for feature disentanglement of Diffusion Models (FDiff)<n>We propose two sampling methods that can boost the realism of our Diffusion Models and also enhance the controllability.
arXiv Detail & Related papers (2023-02-28T07:43:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.