FR-TTS: Test-Time Scaling for NTP-based Image Generation with Effective Filling-based Reward Signal
- URL: http://arxiv.org/abs/2512.00438v1
- Date: Sat, 29 Nov 2025 10:34:16 GMT
- Title: FR-TTS: Test-Time Scaling for NTP-based Image Generation with Effective Filling-based Reward Signal
- Authors: Hang Xu, Linjiang Huang, Feng Zhao,
- Abstract summary: Test-time scaling (TTS) has become a prevalent technique in image generation, significantly boosting output quality.<n>But applying this powerful methodology to the next-token prediction paradigm remains challenging.<n>We introduce the Filling-Based Reward (FR) to estimate the approximate future trajectory of an intermediate sample.<n>We experimentally validate the superiority of FR-TTS over multiple established benchmarks and various reward models.
- Score: 26.72622200307507
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Test-time scaling (TTS) has become a prevalent technique in image generation, significantly boosting output quality by expanding the number of parallel samples and filtering them using pre-trained reward models. However, applying this powerful methodology to the next-token prediction (NTP) paradigm remains challenging. The primary obstacle is the low correlation between the reward of an image decoded from an intermediate token sequence and the reward of the fully generated image. Consequently, these incomplete intermediate representations prove to be poor indicators for guiding the pruning direction, a limitation that stems from their inherent incompleteness in scale or semantic content. To effectively address this critical issue, we introduce the Filling-Based Reward (FR). This novel design estimates the approximate future trajectory of an intermediate sample by finding and applying a reasonable filling scheme to complete the sequence. Both the correlation coefficient between rewards of intermediate samples and final samples, as well as multiple intrinsic signals like token confidence, indicate that the FR provides an excellent and reliable metric for accurately evaluating the quality of intermediate samples. Building upon this foundation, we propose FR-TTS, a sophisticated scaling strategy. FR-TTS efficiently searches for good filling schemes and incorporates a diversity reward with a dynamic weighting schedule to achieve a balanced and comprehensive evaluation of intermediate samples. We experimentally validate the superiority of FR-TTS over multiple established benchmarks and various reward models. Code is available at \href{https://github.com/xuhang07/FR-TTS}{https://github.com/xuhang07/FR-TTS}.
Related papers
- Benchmarking Few-shot Transferability of Pre-trained Models with Improved Evaluation Protocols [123.73663884421272]
Few-shot transfer has been revolutionized by stronger pre-trained models and improved adaptation algorithms.<n>We establish FEWTRANS, a comprehensive benchmark containing 10 diverse datasets.<n>By releasing FEWTRANS, we aim to provide a rigorous "ruler" to streamline reproducible advances in few-shot transfer learning research.
arXiv Detail & Related papers (2026-02-28T05:41:57Z) - RS-Prune: Training-Free Data Pruning at High Ratios for Efficient Remote Sensing Diffusion Foundation Models [14.093802378976315]
Diffusion-based remote sensing (RS) generative foundation models rely on large amounts of globally representative data.<n>We propose a training-free, two-stage data pruning approach that quickly select a high-quality subset under high pruning ratios.<n> Experiments show that, even after pruning 85% of the training data, our method significantly improves convergence and generation quality.
arXiv Detail & Related papers (2025-12-29T06:44:06Z) - Scaling Unverifiable Rewards: A Case Study on Visual Insights [29.54766251030519]
Large Language Model (LLM) agents can increasingly automate complex reasoning through Test-Time Scaling (TTS)<n>We propose Selective TTS, a process-based refinement framework that scales inference across different stages in multi-agent pipeline.<n>Our proposed selective TTS then improves insight quality under a fixed compute budget, increasing mean scores from 61.64 to 65.86 while reducing variance.
arXiv Detail & Related papers (2025-12-27T17:01:38Z) - Diffusion Sampling Path Tells More: An Efficient Plug-and-Play Strategy for Sample Filtering [18.543769006014383]
Diffusion models often exhibit inconsistent sample quality due to variations inherent in their sampling trajectories.<n>We introduce CFG-Rejection, an efficient, plug-and-play strategy that filters low-quality samples at an early stage of the denoising process.<n>We validate the effectiveness of CFG-Rejection in image generation through extensive experiments.
arXiv Detail & Related papers (2025-05-29T11:08:24Z) - Beyond the Proxy: Trajectory-Distilled Guidance for Offline GFlowNet Training [36.64849664688883]
Trajectory-Distilled GFlowNet (TD-GFN) is a novel proxy-free training framework.<n>It learns dense, transition-level edge rewards from offline trajectories via inverse reinforcement learning.<n>It significantly outperforms a broad range of existing baselines in both convergence speed and final sample quality.
arXiv Detail & Related papers (2025-05-26T15:12:22Z) - Bridging SFT and DPO for Diffusion Model Alignment with Self-Sampling Preference Optimization [67.8738082040299]
Self-Sampling Preference Optimization (SSPO) is a new alignment method for post-training reinforcement learning.<n>SSPO eliminates the need for paired data and reward models while retaining the training stability of SFT.<n>SSPO surpasses all previous approaches on the text-to-image benchmarks and demonstrates outstanding performance on the text-to-video benchmarks.
arXiv Detail & Related papers (2024-10-07T17:56:53Z) - Enhancing Test Time Adaptation with Few-shot Guidance [62.49199492255226]
Deep neural networks often encounter significant performance drops while facing with domain shifts between training (source) and test (target) data.<n>Test Time Adaptation (TTA) methods have been proposed to adapt pre-trained source model to handle out-of-distribution streaming target data.<n>We develop Few-Shot Test Time Adaptation (FS-TTA), a novel and practical setting that utilizes a few-shot support set on top of TTA.
arXiv Detail & Related papers (2024-09-02T15:50:48Z) - Training-Free Unsupervised Prompt for Vision-Language Models [27.13778811871694]
We propose Training-Free Unsupervised Prompts (TFUP) to preserve inherent representation capabilities and enhance them with a residual connection to similarity-based prediction probabilities.
TFUP achieves surprising performance, even surpassing the training-base method on multiple classification datasets.
Our TFUP-T achieves new state-of-the-art classification performance compared to unsupervised and few-shot adaptation approaches on multiple benchmarks.
arXiv Detail & Related papers (2024-04-25T05:07:50Z) - A Lightweight Parallel Framework for Blind Image Quality Assessment [7.9562077122537875]
We propose a lightweight parallel framework (LPF) for blind image quality assessment (BIQA)
First, we extract the visual features using a pre-trained feature extraction network. Furthermore, we construct a simple yet effective feature embedding network (FEN) to transform the visual features.
We present two novel self-supervised subtasks, including a sample-level category prediction task and a batch-level quality comparison task.
arXiv Detail & Related papers (2024-02-19T10:56:58Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Deep Boosting Multi-Modal Ensemble Face Recognition with Sample-Level
Weighting [11.39204323420108]
Deep convolutional neural networks have achieved remarkable success in face recognition.
The current training benchmarks exhibit an imbalanced quality distribution.
This poses issues for generalization on hard samples since they are underrepresented during training.
Inspired by the well-known AdaBoost, we propose a sample-level weighting approach to incorporate the importance of different samples into the FR loss.
arXiv Detail & Related papers (2023-08-18T01:44:54Z) - Adaptive Siamese Tracking with a Compact Latent Network [219.38172719948048]
We present an intuitive viewing to simplify the Siamese-based trackers by converting the tracking task to a classification.
Under this viewing, we perform an in-depth analysis for them through visual simulations and real tracking examples.
We apply it to adjust three classical Siamese-based trackers, namely SiamRPN++, SiamFC, and SiamBAN.
arXiv Detail & Related papers (2023-02-02T08:06:02Z) - NeRF in detail: Learning to sample for view synthesis [104.75126790300735]
Neural radiance fields (NeRF) methods have demonstrated impressive novel view synthesis.
In this work we address a clear limitation of the vanilla coarse-to-fine approach -- that it is based on a performance and not trained end-to-end for the task at hand.
We introduce a differentiable module that learns to propose samples and their importance for the fine network, and consider and compare multiple alternatives for its neural architecture.
arXiv Detail & Related papers (2021-06-09T17:59:10Z) - Multi-Scale Positive Sample Refinement for Few-Shot Object Detection [61.60255654558682]
Few-shot object detection (FSOD) helps detectors adapt to unseen classes with few training instances.
We propose a Multi-scale Positive Sample Refinement (MPSR) approach to enrich object scales in FSOD.
MPSR generates multi-scale positive samples as object pyramids and refines the prediction at various scales.
arXiv Detail & Related papers (2020-07-18T09:48:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.