Transition Models: Rethinking the Generative Learning Objective
- URL: http://arxiv.org/abs/2509.04394v1
- Date: Thu, 04 Sep 2025 17:05:59 GMT
- Title: Transition Models: Rethinking the Generative Learning Objective
- Authors: Zidong Wang, Yiyuan Zhang, Xiaoyu Yue, Xiangyu Yue, Yangguang Li, Wanli Ouyang, Lei Bai,
- Abstract summary: We introduce a continuous-time dynamics equation that analytically defines state transitions across any finite time interval.<n>This leads to a novel generative paradigm, Transition Models (TiM), which adapt to arbitrary-step transitions.<n>TiM achieves state-of-the-art performance, surpassing leading models such as SD3.5 (8B parameters) and FLUX.1 (12B parameters) across all evaluated step counts.
- Score: 68.16330673177207
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A fundamental dilemma in generative modeling persists: iterative diffusion models achieve outstanding fidelity, but at a significant computational cost, while efficient few-step alternatives are constrained by a hard quality ceiling. This conflict between generation steps and output quality arises from restrictive training objectives that focus exclusively on either infinitesimal dynamics (PF-ODEs) or direct endpoint prediction. We address this challenge by introducing an exact, continuous-time dynamics equation that analytically defines state transitions across any finite time interval. This leads to a novel generative paradigm, Transition Models (TiM), which adapt to arbitrary-step transitions, seamlessly traversing the generative trajectory from single leaps to fine-grained refinement with more steps. Despite having only 865M parameters, TiM achieves state-of-the-art performance, surpassing leading models such as SD3.5 (8B parameters) and FLUX.1 (12B parameters) across all evaluated step counts. Importantly, unlike previous few-step generators, TiM demonstrates monotonic quality improvement as the sampling budget increases. Additionally, when employing our native-resolution strategy, TiM delivers exceptional fidelity at resolutions up to 4096x4096.
Related papers
- Formalizing the Sampling Design Space of Diffusion-Based Generative Models via Adaptive Solvers and Wasserstein-Bounded Timesteps [4.397130429878499]
Diffusion-based generative models have achieved remarkable performance across various domains, yet their practical deployment is often limited by high sampling costs.<n>We propose SDM, a principled framework that aligns the numerical solver with the intrinsic properties of the diffusion trajectory.<n>By analyzing the ODE dynamics, we show that efficient low-order solvers suffice in early high-noise stages while higher-order solvers can be progressively deployed to handle the increasing non-linearity of later stages.
arXiv Detail & Related papers (2026-02-13T05:02:07Z) - The Offline-Frontier Shift: Diagnosing Distributional Limits in Generative Multi-Objective Optimization [56.39938641873341]
We show that generative methods systematically underperform evolutionary alternatives with respect to other metrics, such as generational distance.<n>We argue that overcoming this limitation requires out-of-distribution sampling in objective space.<n>Our results position offline MOO as a distribution-shift--limited problem and provide a diagnostic lens for understanding when and why generative optimization methods fail.
arXiv Detail & Related papers (2026-02-11T18:38:40Z) - Dual-End Consistency Model [41.982957134224904]
Slow iterative sampling is a major bottleneck for the practical deployment of diffusion and flow-based generative models.<n>We propose a Dual-End Consistency Model (DE-CM) that selects vital sub-trajectory clusters to achieve stable and effective training.<n>Our method achieves a state-of-the-art FID score of 1.70 in one-step generation on the ImageNet 256x256 dataset, outperforming existing CM-based one-step approaches.
arXiv Detail & Related papers (2026-02-11T11:51:01Z) - Simulating Distribution Dynamics: Liquid Temporal Feature Evolution for Single-Domain Generalized Object Detection [58.25418970608328]
Single-Domain Generalized Object Detection (Single-DGOD) aims to transfer a detector trained on one source domain to multiple unknown domains.<n>Existing methods for Single-DGOD typically rely on discrete data augmentation or static perturbation methods to expand data diversity.<n>We propose a new method, which simulates the progressive evolution of features from the source domain to simulated latent distributions.
arXiv Detail & Related papers (2025-11-13T03:10:39Z) - Hybrid Autoregressive-Diffusion Model for Real-Time Streaming Sign Language Production [0.0]
We introduce a hybrid approach combining autoregressive and diffusion models to generate Sign Language Production (SLP) models.<n>To capture fine-grained body movements, we design a Multi-Scale Pose Representation module that separately extracts detailed features from distinct arttors.<n>We also introduce a Confidence-Aware Causal Attention mechanism that utilizes joint-level confidence scores to dynamically guide the pose generation process.
arXiv Detail & Related papers (2025-07-12T01:34:50Z) - Transition Matching: Scalable and Flexible Generative Modeling [36.605030979361516]
This paper introduces Transition Matching (TM), a novel discrete-time, continuous-state generative paradigm that unifies and advances both diffusion/flow models and continuous AR generation.<n>TM decomposes complex generation tasks into simpler Markov transitions, allowing for expressive non-deterministic probability transition kernels and arbitrary non-continuous supervision processes.
arXiv Detail & Related papers (2025-06-30T07:51:58Z) - Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.<n>Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.<n>We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion [56.38386580040991]
Consistency Trajectory Model (CTM) is a generalization of Consistency Models (CM)
CTM enables the efficient combination of adversarial training and denoising score matching loss to enhance performance.
Unlike CM, CTM's access to the score function can streamline the adoption of established controllable/conditional generation methods.
arXiv Detail & Related papers (2023-10-01T05:07:17Z) - AdaDiff: Accelerating Diffusion Models through Step-Wise Adaptive Computation [32.74923906921339]
Diffusion models achieve great success in generating diverse and high-fidelity images, yet their widespread application is hampered by their inherently slow generation speed.
We propose AdaDiff, an adaptive framework that dynamically allocates computation resources in each sampling step to improve the generation efficiency of diffusion models.
arXiv Detail & Related papers (2023-09-29T09:10:04Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.