Related papers: BAgger: Backwards Aggregation for Mitigating Drift in Autoregressive Video Diffusion Models

BAgger: Backwards Aggregation for Mitigating Drift in Autoregressive Video Diffusion Models

URL: http://arxiv.org/abs/2512.12080v1
Date: Fri, 12 Dec 2025 23:02:02 GMT
Title: BAgger: Backwards Aggregation for Mitigating Drift in Autoregressive Video Diffusion Models
Authors: Ryan Po, Eric Ryan Chan, Changan Chen, Gordon Wetzstein,
Abstract summary: We introduce Backwards Aggregation (BAgger), a self-supervised scheme that constructs corrective trajectories from the model's own rollouts.<n>Unlike prior approaches that rely on few-step distillation and distribution-matching losses, BAgger trains with standard score or flow matching objectives.<n>We instantiate BAgger on causal diffusion transformers and evaluate on text-to-video, video extension, and multi-prompt generation.
Score: 50.986189632485285
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autoregressive video models are promising for world modeling via next-frame prediction, but they suffer from exposure bias: a mismatch between training on clean contexts and inference on self-generated frames, causing errors to compound and quality to drift over time. We introduce Backwards Aggregation (BAgger), a self-supervised scheme that constructs corrective trajectories from the model's own rollouts, teaching it to recover from its mistakes. Unlike prior approaches that rely on few-step distillation and distribution-matching losses, which can hurt quality and diversity, BAgger trains with standard score or flow matching objectives, avoiding large teachers and long-chain backpropagation through time. We instantiate BAgger on causal diffusion transformers and evaluate on text-to-video, video extension, and multi-prompt generation, observing more stable long-horizon motion and better visual consistency with reduced drift.

Related papers

Inherited Goal Drift: Contextual Pressure Can Undermine Agentic Goals [0.0]
We provide an updated characterization of the extent and causes of goal drift.<n>We investigate drift in state-of-the-art models within a simulated stock-trading environment.<n>We find that drift behavior is inconsistent between prompt variations and correlates poorly with instruction hierarchy following behavior.
arXiv Detail & Related papers (2026-03-03T18:50:59Z)
LIVE: Long-horizon Interactive Video World Modeling [39.52605866460851]
Long-horizon Interactive Video world modEl enforces bounded error accumulation via a novel cycle-consistency objective.<n>Live achieves state-of-the-art performance on long-horizon benchmarks, generating stable, high-quality videos far beyond training rollout lengths.
arXiv Detail & Related papers (2026-02-03T17:10:03Z)
LoL: Longer than Longer, Scaling Video Generation to Hour [50.945885467651216]
This work achieves the first demonstration of real-time, streaming, and infinite-length video generation with little quality decay.<n>As an illustration, we generate continuous videos up to 12 hours in length, which, to our knowledge, is among the longest publicly demonstrated results in streaming video generation.
arXiv Detail & Related papers (2026-01-23T17:21:35Z)
End-to-End Training for Autoregressive Video Diffusion via Self-Resampling [63.84672807009907]
Autoregressive video diffusion models hold promise for world simulation but are vulnerable to exposure bias arising from the train-test mismatch.<n>We introduce Resampling Forcing, a teacher-free framework that enables training autoregressive video models from scratch and at scale.
arXiv Detail & Related papers (2025-12-17T18:53:29Z)
Stable Video Infinity: Infinite-Length Video Generation with Error Recycling [76.91310169118408]
We propose Stable Video Infinity (SVI) that is able to generate infinite-length videos with high temporal consistency, plausible scene transitions, and controllable streaming storylines.<n> SVI incorporates Error-Recycling Fine-Tuning, a new type of efficient training that recycles the Diffusion Transformer's self-generated errors into supervisory prompts.<n>We evaluate SVI on three benchmarks, including consistent, creative, and conditional settings, thoroughly verifying its versatility and state-of-the-art role.
arXiv Detail & Related papers (2025-10-10T09:45:46Z)
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion [67.94300151774085]
We introduce Self Forcing, a novel training paradigm for autoregressive video diffusion models.<n>It addresses the longstanding issue of exposure bias, where models trained on ground-truth context must generate sequences conditioned on their own imperfect outputs.
arXiv Detail & Related papers (2025-06-09T17:59:55Z)
AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion [19.98565541640125]
We introduce Auto-Regressive Diffusion (AR-Diffusion), a novel model that combines the strengths of auto-regressive and diffusion models for flexible video generation.<n>Inspired by auto-regressive generation, we incorporate a non-decreasing constraint on the corruption timesteps of individual frames.<n>This setup, together with temporal causal attention, enables flexible generation of videos with varying lengths while preserving temporal coherence.
arXiv Detail & Related papers (2025-03-10T15:05:59Z)
From Slow Bidirectional to Fast Autoregressive Video Diffusion Models [48.35054927704544]
Current video diffusion models achieve impressive generation quality but struggle in interactive applications due to bidirectional attention dependencies.<n>We address this limitation by adapting a pretrained bidirectional diffusion transformer to an autoregressive transformer that generates frames on-the-fly.<n>Our model achieves a total score of 84.27 on the VBench-Long benchmark, surpassing all previous video generation models.
arXiv Detail & Related papers (2024-12-10T18:59:50Z)
Haar Wavelet based Block Autoregressive Flows for Trajectories [129.37479472754083]
Prediction of trajectories such as that of pedestrians is crucial to the performance of autonomous agents. We introduce a novel Haar wavelet based block autoregressive model leveraging split couplings. We illustrate the advantages of our approach for generating diverse and accurate trajectories on two real-world datasets.
arXiv Detail & Related papers (2020-09-21T13:57:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.