Entropy-Guided k-Guard Sampling for Long-Horizon Autoregressive Video Generation
- URL: http://arxiv.org/abs/2601.19488v2
- Date: Fri, 30 Jan 2026 11:29:48 GMT
- Title: Entropy-Guided k-Guard Sampling for Long-Horizon Autoregressive Video Generation
- Authors: Yizhao Han, Tianxing Shi, Zhao Wang, Zifan Xu, Zhiyuan Pu, Mingxiao Li, Qian Zhang, Wei Yin, Xiao-Xiao Long,
- Abstract summary: We propose Entropy-Guard k-gressive sampling, a strategy that adapts sampling to token-wise dispersion.<n> ENkG uses adaptive token candidate sizes for low-entropy regions, it employs fewer candidates to suppress redundant noise and preserve structural integrity.<n> Experiments demonstrate consistent improvements in perceptual quality and structural stability compared to static top-k/top-p strategies.
- Score: 22.973340187143616
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autoregressive (AR) architectures have achieved significant successes in LLMs, inspiring explorations for video generation. In LLMs, top-p/top-k sampling strategies work exceptionally well: language tokens have high semantic density and low redundancy, so a fixed size of token candidates already strikes a balance between semantic accuracy and generation diversity. In contrast, video tokens have low semantic density and high spatio-temporal redundancy. This mismatch makes static top-k/top-p strategies ineffective for video decoders: they either introduce unnecessary randomness for low-uncertainty regions (static backgrounds) or get stuck in early errors for high-uncertainty regions (foreground objects). Prediction errors will accumulate as more frames are generated and eventually severely degrade long-horizon quality. To address this, we propose Entropy-Guided k-Guard (ENkG) sampling, a simple yet effective strategy that adapts sampling to token-wise dispersion, quantified by the entropy of each token's predicted distribution. ENkG uses adaptive token candidate sizes: for low-entropy regions, it employs fewer candidates to suppress redundant noise and preserve structural integrity; for high-entropy regions, it uses more candidates to mitigate error compounding. ENkG is model-agnostic, training-free, and adds negligible overhead. Experiments demonstrate consistent improvements in perceptual quality and structural stability compared to static top-k/top-p strategies.
Related papers
- TAP: A Token-Adaptive Predictor Framework for Training-Free Diffusion Acceleration [19.18455910385295]
Token-Adaptive Predictor (TAP) is a training-free, probe-driven framework that adaptively selects a predictor for each token at every sampling step.<n>TAP incurs negligible overhead while enabling large speedups with little or no perceptual quality loss.
arXiv Detail & Related papers (2026-03-04T07:10:11Z) - Better Matching, Less Forgetting: A Quality-Guided Matcher for Transformer-based Incremental Object Detection [37.2487040069697]
A persistent challenge is catastrophic forgetting, primarily attributed to background shift in conventional detectors.<n>We identify a novel, distinct source of forgetting specific to DETR-like architectures: background foregrounding.<n>This arises from the exhaustiveness constraint of the Hungarian matcher, which forcibly assigns every ground truth target to one prediction.
arXiv Detail & Related papers (2026-03-02T06:56:14Z) - Semantic Tube Prediction: Beating LLM Data Efficiency with JEPA [50.494504099850325]
We introduce the Geodesic Hypothesis, positing that token sequences trace geodesics on a smooth semantic manifold and are therefore locally linear.<n>We show this constraint improves signal-to-noise ratio, and preserves diversity by preventing collisions during trajectory.<n>We demonstrate that geometric priors can surpass brute-force scaling.
arXiv Detail & Related papers (2026-02-26T04:45:07Z) - Enhancing Adversarial Transferability by Balancing Exploration and Exploitation with Gradient-Guided Sampling [82.52485740425321]
Adrial attacks present a critical challenge to deep neural networks' robustness.<n> transferability of adversarial attacks faces a dilemma between Exploitation (maximizing attack potency) and Exploration (enhancing cross-model generalization)
arXiv Detail & Related papers (2025-11-01T05:43:47Z) - Efficient Video Sampling: Pruning Temporally Redundant Tokens for Faster VLM Inference [5.146388234814547]
Long videos often exceed the token budget of modern language models, leading to severe context limitations and latency issues.<n>We introduce Efficient Video Sampling (EVS), a simple, plug-and-play method for reducing token redundancy in videos by identifying and pruning temporally static patches.<n>EVS substantially reduces token count while maintaining semantic fidelity, enabling faster inference and longer input sequences.
arXiv Detail & Related papers (2025-10-16T12:34:38Z) - Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation [14.086036250269613]
Adapting Vision-Language Models to new domains with few labeled samples is a challenge due to severe overfitting and computational constraints.<n>In this paper, we propose a novel Sparse Optimization framework that dynamically adjust very few parameters.<n>Experiments on 11 diverse datasets show that SO achieves state-of-the-art few-shot adaptation performance while reducing memory overhead.
arXiv Detail & Related papers (2025-04-16T19:10:34Z) - Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection [49.15148871877941]
Next-token distribution outputs offer a theoretically appealing approach for detection of large language models (LLMs)<n>We propose the Perplexity Attention Weighted Network (PAWN), which uses the last hidden states of the LLM and positions to weight the sum of a series of features based on metrics from the next-token distribution across the sequence length.<n>PAWN shows competitive and even better performance in-distribution than the strongest baselines with a fraction of their trainable parameters.
arXiv Detail & Related papers (2025-01-07T17:00:49Z) - Rethinking Weight Decay for Robust Fine-Tuning of Foundation Models [27.847140934456288]
This paper proposes a new weight decay technique, Selective Projection Decay (SPD)
SPD selectively imposes a strong penalty on certain layers while allowing others to change freely.
When equipped with SPD, Adam consistently provides better in-distribution robustness and out-of-distribution performance on benchmarks.
arXiv Detail & Related papers (2024-11-03T23:36:53Z) - LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision [58.6039004982056]
We propose a neuro-symbolic framework to enable training generators using only video captions.<n>An alignment algorithm overcomes the challenges of weak supervision by leveraging a differentiable symbolic reasoner.<n>We evaluate our method on three video datasets: OpenPVSG, 20BN, and MUGEN.
arXiv Detail & Related papers (2023-04-15T22:24:05Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images)
We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples.
Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.