BAgger: Backwards Aggregation for Mitigating Drift in Autoregressive Video Diffusion Models
- URL: http://arxiv.org/abs/2512.12080v1
- Date: Fri, 12 Dec 2025 23:02:02 GMT
- Title: BAgger: Backwards Aggregation for Mitigating Drift in Autoregressive Video Diffusion Models
- Authors: Ryan Po, Eric Ryan Chan, Changan Chen, Gordon Wetzstein,
- Abstract summary: We introduce Backwards Aggregation (BAgger), a self-supervised scheme that constructs corrective trajectories from the model's own rollouts.<n>Unlike prior approaches that rely on few-step distillation and distribution-matching losses, BAgger trains with standard score or flow matching objectives.<n>We instantiate BAgger on causal diffusion transformers and evaluate on text-to-video, video extension, and multi-prompt generation.
- Score: 50.986189632485285
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autoregressive video models are promising for world modeling via next-frame prediction, but they suffer from exposure bias: a mismatch between training on clean contexts and inference on self-generated frames, causing errors to compound and quality to drift over time. We introduce Backwards Aggregation (BAgger), a self-supervised scheme that constructs corrective trajectories from the model's own rollouts, teaching it to recover from its mistakes. Unlike prior approaches that rely on few-step distillation and distribution-matching losses, which can hurt quality and diversity, BAgger trains with standard score or flow matching objectives, avoiding large teachers and long-chain backpropagation through time. We instantiate BAgger on causal diffusion transformers and evaluate on text-to-video, video extension, and multi-prompt generation, observing more stable long-horizon motion and better visual consistency with reduced drift.
Related papers
- Inherited Goal Drift: Contextual Pressure Can Undermine Agentic Goals [0.0]
We provide an updated characterization of the extent and causes of goal drift.<n>We investigate drift in state-of-the-art models within a simulated stock-trading environment.<n>We find that drift behavior is inconsistent between prompt variations and correlates poorly with instruction hierarchy following behavior.
arXiv Detail & Related papers (2026-03-03T18:50:59Z) - LIVE: Long-horizon Interactive Video World Modeling [39.52605866460851]
Long-horizon Interactive Video world modEl enforces bounded error accumulation via a novel cycle-consistency objective.<n>Live achieves state-of-the-art performance on long-horizon benchmarks, generating stable, high-quality videos far beyond training rollout lengths.
arXiv Detail & Related papers (2026-02-03T17:10:03Z) - LoL: Longer than Longer, Scaling Video Generation to Hour [50.945885467651216]
This work achieves the first demonstration of real-time, streaming, and infinite-length video generation with little quality decay.<n>As an illustration, we generate continuous videos up to 12 hours in length, which, to our knowledge, is among the longest publicly demonstrated results in streaming video generation.
arXiv Detail & Related papers (2026-01-23T17:21:35Z) - End-to-End Training for Autoregressive Video Diffusion via Self-Resampling [63.84672807009907]
Autoregressive video diffusion models hold promise for world simulation but are vulnerable to exposure bias arising from the train-test mismatch.<n>We introduce Resampling Forcing, a teacher-free framework that enables training autoregressive video models from scratch and at scale.
arXiv Detail & Related papers (2025-12-17T18:53:29Z) - Stable Video Infinity: Infinite-Length Video Generation with Error Recycling [76.91310169118408]
We propose Stable Video Infinity (SVI) that is able to generate infinite-length videos with high temporal consistency, plausible scene transitions, and controllable streaming storylines.<n> SVI incorporates Error-Recycling Fine-Tuning, a new type of efficient training that recycles the Diffusion Transformer's self-generated errors into supervisory prompts.<n>We evaluate SVI on three benchmarks, including consistent, creative, and conditional settings, thoroughly verifying its versatility and state-of-the-art role.
arXiv Detail & Related papers (2025-10-10T09:45:46Z) - Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion [67.94300151774085]
We introduce Self Forcing, a novel training paradigm for autoregressive video diffusion models.<n>It addresses the longstanding issue of exposure bias, where models trained on ground-truth context must generate sequences conditioned on their own imperfect outputs.
arXiv Detail & Related papers (2025-06-09T17:59:55Z) - AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion [19.98565541640125]
We introduce Auto-Regressive Diffusion (AR-Diffusion), a novel model that combines the strengths of auto-regressive and diffusion models for flexible video generation.<n>Inspired by auto-regressive generation, we incorporate a non-decreasing constraint on the corruption timesteps of individual frames.<n>This setup, together with temporal causal attention, enables flexible generation of videos with varying lengths while preserving temporal coherence.
arXiv Detail & Related papers (2025-03-10T15:05:59Z) - From Slow Bidirectional to Fast Autoregressive Video Diffusion Models [48.35054927704544]
Current video diffusion models achieve impressive generation quality but struggle in interactive applications due to bidirectional attention dependencies.<n>We address this limitation by adapting a pretrained bidirectional diffusion transformer to an autoregressive transformer that generates frames on-the-fly.<n>Our model achieves a total score of 84.27 on the VBench-Long benchmark, surpassing all previous video generation models.
arXiv Detail & Related papers (2024-12-10T18:59:50Z) - Haar Wavelet based Block Autoregressive Flows for Trajectories [129.37479472754083]
Prediction of trajectories such as that of pedestrians is crucial to the performance of autonomous agents.
We introduce a novel Haar wavelet based block autoregressive model leveraging split couplings.
We illustrate the advantages of our approach for generating diverse and accurate trajectories on two real-world datasets.
arXiv Detail & Related papers (2020-09-21T13:57:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.