Related papers: Entropy-Guided k-Guard Sampling for Long-Horizon Autoregressive Video Generation

Entropy-Guided k-Guard Sampling for Long-Horizon Autoregressive Video Generation

URL: http://arxiv.org/abs/2601.19488v2
Date: Fri, 30 Jan 2026 11:29:48 GMT
Title: Entropy-Guided k-Guard Sampling for Long-Horizon Autoregressive Video Generation
Authors: Yizhao Han, Tianxing Shi, Zhao Wang, Zifan Xu, Zhiyuan Pu, Mingxiao Li, Qian Zhang, Wei Yin, Xiao-Xiao Long,
Abstract summary: We propose Entropy-Guard k-gressive sampling, a strategy that adapts sampling to token-wise dispersion.<n> ENkG uses adaptive token candidate sizes for low-entropy regions, it employs fewer candidates to suppress redundant noise and preserve structural integrity.<n> Experiments demonstrate consistent improvements in perceptual quality and structural stability compared to static top-k/top-p strategies.
Score: 22.973340187143616
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autoregressive (AR) architectures have achieved significant successes in LLMs, inspiring explorations for video generation. In LLMs, top-p/top-k sampling strategies work exceptionally well: language tokens have high semantic density and low redundancy, so a fixed size of token candidates already strikes a balance between semantic accuracy and generation diversity. In contrast, video tokens have low semantic density and high spatio-temporal redundancy. This mismatch makes static top-k/top-p strategies ineffective for video decoders: they either introduce unnecessary randomness for low-uncertainty regions (static backgrounds) or get stuck in early errors for high-uncertainty regions (foreground objects). Prediction errors will accumulate as more frames are generated and eventually severely degrade long-horizon quality. To address this, we propose Entropy-Guided k-Guard (ENkG) sampling, a simple yet effective strategy that adapts sampling to token-wise dispersion, quantified by the entropy of each token's predicted distribution. ENkG uses adaptive token candidate sizes: for low-entropy regions, it employs fewer candidates to suppress redundant noise and preserve structural integrity; for high-entropy regions, it uses more candidates to mitigate error compounding. ENkG is model-agnostic, training-free, and adds negligible overhead. Experiments demonstrate consistent improvements in perceptual quality and structural stability compared to static top-k/top-p strategies.

Related papers

TAP: A Token-Adaptive Predictor Framework for Training-Free Diffusion Acceleration [19.18455910385295]
Token-Adaptive Predictor (TAP) is a training-free, probe-driven framework that adaptively selects a predictor for each token at every sampling step.<n>TAP incurs negligible overhead while enabling large speedups with little or no perceptual quality loss.
arXiv Detail & Related papers (2026-03-04T07:10:11Z)
Better Matching, Less Forgetting: A Quality-Guided Matcher for Transformer-based Incremental Object Detection [37.2487040069697]
A persistent challenge is catastrophic forgetting, primarily attributed to background shift in conventional detectors.<n>We identify a novel, distinct source of forgetting specific to DETR-like architectures: background foregrounding.<n>This arises from the exhaustiveness constraint of the Hungarian matcher, which forcibly assigns every ground truth target to one prediction.
arXiv Detail & Related papers (2026-03-02T06:56:14Z)
Semantic Tube Prediction: Beating LLM Data Efficiency with JEPA [50.494504099850325]
We introduce the Geodesic Hypothesis, positing that token sequences trace geodesics on a smooth semantic manifold and are therefore locally linear.<n>We show this constraint improves signal-to-noise ratio, and preserves diversity by preventing collisions during trajectory.<n>We demonstrate that geometric priors can surpass brute-force scaling.
arXiv Detail & Related papers (2026-02-26T04:45:07Z)
Enhancing Adversarial Transferability by Balancing Exploration and Exploitation with Gradient-Guided Sampling [82.52485740425321]
Adrial attacks present a critical challenge to deep neural networks' robustness.<n> transferability of adversarial attacks faces a dilemma between Exploitation (maximizing attack potency) and Exploration (enhancing cross-model generalization)
arXiv Detail & Related papers (2025-11-01T05:43:47Z)
Efficient Video Sampling: Pruning Temporally Redundant Tokens for Faster VLM Inference [5.146388234814547]
Long videos often exceed the token budget of modern language models, leading to severe context limitations and latency issues.<n>We introduce Efficient Video Sampling (EVS), a simple, plug-and-play method for reducing token redundancy in videos by identifying and pruning temporally static patches.<n>EVS substantially reduces token count while maintaining semantic fidelity, enabling faster inference and longer input sequences.
arXiv Detail & Related papers (2025-10-16T12:34:38Z)
Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation [14.086036250269613]
Adapting Vision-Language Models to new domains with few labeled samples is a challenge due to severe overfitting and computational constraints.<n>In this paper, we propose a novel Sparse Optimization framework that dynamically adjust very few parameters.<n>Experiments on 11 diverse datasets show that SO achieves state-of-the-art few-shot adaptation performance while reducing memory overhead.
arXiv Detail & Related papers (2025-04-16T19:10:34Z)
Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection [49.15148871877941]
Next-token distribution outputs offer a theoretically appealing approach for detection of large language models (LLMs)<n>We propose the Perplexity Attention Weighted Network (PAWN), which uses the last hidden states of the LLM and positions to weight the sum of a series of features based on metrics from the next-token distribution across the sequence length.<n>PAWN shows competitive and even better performance in-distribution than the strongest baselines with a fraction of their trainable parameters.
arXiv Detail & Related papers (2025-01-07T17:00:49Z)
Rethinking Weight Decay for Robust Fine-Tuning of Foundation Models [27.847140934456288]
This paper proposes a new weight decay technique, Selective Projection Decay (SPD) SPD selectively imposes a strong penalty on certain layers while allowing others to change freely. When equipped with SPD, Adam consistently provides better in-distribution robustness and out-of-distribution performance on benchmarks.
arXiv Detail & Related papers (2024-11-03T23:36:53Z)
LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision [58.6039004982056]
We propose a neuro-symbolic framework to enable training generators using only video captions.<n>An alignment algorithm overcomes the challenges of weak supervision by leveraging a differentiable symbolic reasoner.<n>We evaluate our method on three video datasets: OpenPVSG, 20BN, and MUGEN.
arXiv Detail & Related papers (2023-04-15T22:24:05Z)
Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference. It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z)
BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images) We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples. Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.