Rethinking Training Dynamics in Scale-wise Autoregressive Generation
- URL: http://arxiv.org/abs/2512.06421v1
- Date: Sat, 06 Dec 2025 12:41:42 GMT
- Title: Rethinking Training Dynamics in Scale-wise Autoregressive Generation
- Authors: Gengze Zhou, Chongjian Ge, Hao Tan, Feng Liu, Yicong Hong,
- Abstract summary: Next-scale prediction has emerged as a popular paradigm, where models generate images in a coarse-to-fine manner.<n>Scale-wise AR models suffer from exposure bias, which undermines generation quality.<n>We propose Self-Autoregressive Refinement to address these limitations.
- Score: 22.58390823803937
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in autoregressive (AR) generative models have produced increasingly powerful systems for media synthesis. Among them, next-scale prediction has emerged as a popular paradigm, where models generate images in a coarse-to-fine manner. However, scale-wise AR models suffer from exposure bias, which undermines generation quality. We identify two primary causes of this issue: (1) train-test mismatch, where the model must rely on its own imperfect predictions during inference, and (2) imbalance in scale-wise learning difficulty, where certain scales exhibit disproportionately higher optimization complexity. Through a comprehensive analysis of training dynamics, we propose Self-Autoregressive Refinement (SAR) to address these limitations. SAR introduces a Stagger-Scale Rollout (SSR) mechanism that performs lightweight autoregressive rollouts to expose the model to its own intermediate predictions, thereby aligning train-test patterns, and a complementary Contrastive Student-Forcing Loss (CSFL) that provides adequate supervision for self-generated contexts to ensure stable training. Experimental results show that applying SAR to pretrained AR models consistently improves generation quality with minimal computational overhead. For instance, SAR yields a 5.2% FID reduction on FlexVAR-d16 trained on ImageNet 256 within 10 epochs (5 hours on 32xA100 GPUs). Given its efficiency, scalability, and effectiveness, we expect SAR to serve as a reliable post-training method for visual autoregressive generation.
Related papers
- Towards Scaling Laws for Symbolic Regression [45.609070591068836]
Symbolic regression aims to discover the underlying mathematical expressions that explain observed data.<n>Deep learning-based SR has recently become competitive with genetic programming approaches.<n>We present the first systematic investigation of scaling in SR, using a scalable end-to-end transformer pipeline.
arXiv Detail & Related papers (2025-10-30T01:36:44Z) - Deep Generative Continual Learning using Functional LoRA: FunLoRA [12.547444644243543]
A common strategy consists in retraining the generative model on its own synthetic data in order to mitigate forgetting.<n>We propose a novel and more expressive conditioning mechanism for generative models based on low rank adaptation (LoRA)<n>Our proposed parameter-efficient fine-tuning (PEFT) method surpasses prior state-of-the-art results based on diffusion models.
arXiv Detail & Related papers (2025-10-03T00:18:05Z) - NSARM: Next-Scale Autoregressive Modeling for Robust Real-World Image Super-Resolution [17.72407853450265]
We introduce a robust Real-ISR framework, namely Next-Scale Autoregressive Modeling (NSARM)<n>As a pure AR model, NSARM achieves superior visual results over existing Real-ISR methods while maintaining a fast inference speed.
arXiv Detail & Related papers (2025-10-01T12:29:58Z) - The Impact of Scaling Training Data on Adversarial Robustness [28.844098517315228]
Robustness follows a logarithmic scaling law with both data volume and model size.<n>Some self-supervised models trained on datasets, such as DINOv2, outperform others trained on much larger but less curated datasets.<n>Human evaluation reveals persistent gaps between human and machine vision.
arXiv Detail & Related papers (2025-09-30T08:20:56Z) - Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining [74.83412846804977]
Reinforcement learning (RL)-based fine-tuning has become a crucial step in post-training language models.<n>We present a systematic end-to-end study of RL fine-tuning for mathematical reasoning by training models entirely from scratch.
arXiv Detail & Related papers (2025-04-10T17:15:53Z) - On the Diminishing Returns of Complex Robust RAG Training in the Era of Powerful LLMs [85.688901949146]
We investigate the question: does the benefit of complex robust training methods diminish as language models become more powerful?<n>Our analysis reveals a consistent trend: emphthe marginal robustness benefit of sophisticated training strategies decreases substantially as model capacity increases.<n>Further investigation demonstrates that stronger models naturally exhibit better confidence calibration, cross-dataset generalization capability, and more effective attention patterns, even under simple training regimes.
arXiv Detail & Related papers (2025-02-17T03:34:31Z) - Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy [104.48511402784763]
Performance Law for SR models aims to theoretically investigate and model the relationship between model performance and data quality.<n>We propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics.
arXiv Detail & Related papers (2024-11-30T10:56:30Z) - Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness [52.9493817508055]
We propose Pre-trained Model Guided Adversarial Fine-Tuning (PMG-AFT) to enhance the model's zero-shot adversarial robustness.
Our approach consistently improves clean accuracy by an average of 8.72%.
arXiv Detail & Related papers (2024-01-09T04:33:03Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - The curse of overparametrization in adversarial training: Precise
analysis of robust generalization for random features regression [34.35440701530876]
We show that for adversarially trained random features models, high overparametrization can hurt robust generalization.
Our developed theory reveals the nontrivial effect of overparametrization on robustness and indicates that for adversarially trained random features models, high overparametrization can hurt robust generalization.
arXiv Detail & Related papers (2022-01-13T18:57:30Z) - Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.