RTTC: Reward-Guided Collaborative Test-Time Compute
- URL: http://arxiv.org/abs/2508.10024v1
- Date: Thu, 07 Aug 2025 21:18:52 GMT
- Title: RTTC: Reward-Guided Collaborative Test-Time Compute
- Authors: J. Pablo Muñoz, Jinjie Yuan,
- Abstract summary: Test-Time Compute (TTC) has emerged as a powerful paradigm for enhancing the performance of Large Language Models (LLMs) at inference.<n>We introduce Reward-Guided Test-Time Compute (RTTC), a novel framework that adaptively selects the most effective TTC strategy for each query.<n>RTTC operates in a distributed server-client architecture, retrieving relevant samples from a remote knowledge base and applying RAG or lightweight fine-tuning on client devices only when necessary.
- Score: 0.9208007322096533
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Test-Time Compute (TTC) has emerged as a powerful paradigm for enhancing the performance of Large Language Models (LLMs) at inference, leveraging strategies such as Test-Time Training (TTT) and Retrieval-Augmented Generation (RAG). However, the optimal adaptation strategy varies across queries, and indiscriminate application of TTC strategy incurs substantial computational overhead. In this work, we introduce Reward-Guided Test-Time Compute (RTTC), a novel framework that adaptively selects the most effective TTC strategy for each query via a pretrained reward model, maximizing downstream accuracy across diverse domains and tasks. RTTC operates in a distributed server-client architecture, retrieving relevant samples from a remote knowledge base and applying RAG or lightweight fine-tuning on client devices only when necessary. To further mitigate redundant computation, we propose Query-State Caching, which enables the efficient reuse of historical query states at both retrieval and adaptation levels. Extensive experiments across multiple LLMs and benchmarks demonstrate that RTTC consistently achieves superior accuracy compared to vanilla RAG or TTT, validating the necessity of adaptive, reward-guided TTC selection and the potential of RTTC for scalable, high-performance language model adaptation.
Related papers
- What If We Allocate Test-Time Compute Adaptively? [2.1713977971908944]
Test-time scaling allocates inference computation uniformly, uses fixed sampling strategies, and applies verification only for reranking.<n>We propose a verifier-guided adaptive framework treating reasoning as iterative trajectory generation and selection.<n>Across datasets, our dynamic, PRM-guided approach consistently outperforms direct test-time scaling.
arXiv Detail & Related papers (2026-02-01T07:30:22Z) - Test-time Correlation Alignment [2.389598109913754]
Test-Time Adaptation (TTA) adapts using only unlabeled test data.<n>Test-time Correlation Alignment (TCA) can enhance test performances with a theoretical guarantee.<n> LinearTCA applies a simple linear transformation to achieve both instance and correlation alignment without additional model updates.<n> LinearTCA+ serves as a plug-and-play module that can easily boost existing TTA methods.
arXiv Detail & Related papers (2025-05-01T13:59:13Z) - LoRA-TTT: Low-Rank Test-Time Training for Vision-Language Models [23.218237408724676]
We propose LoRA-TTT, a novel Test-Time Training (TTT) method for vision-language models (VLMs)<n>By introducing LoRA and updating only its parameters during test time, our method offers a simple yet effective TTT approach.<n>Our method can adapt to diverse domains by combining these two losses, without increasing memory consumption or runtime.
arXiv Detail & Related papers (2025-02-04T07:40:26Z) - Chain-of-Retrieval Augmented Generation [72.06205327186069]
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer.<n>Our proposed method, CoRAG, allows the model to dynamically reformulate the query based on the evolving state.
arXiv Detail & Related papers (2025-01-24T09:12:52Z) - Bridging SFT and DPO for Diffusion Model Alignment with Self-Sampling Preference Optimization [67.8738082040299]
Self-Sampling Preference Optimization (SSPO) is a new alignment method for post-training reinforcement learning.<n>SSPO eliminates the need for paired data and reward models while retaining the training stability of SFT.<n>SSPO surpasses all previous approaches on the text-to-image benchmarks and demonstrates outstanding performance on the text-to-video benchmarks.
arXiv Detail & Related papers (2024-10-07T17:56:53Z) - IT$^3$: Idempotent Test-Time Training [95.78053599609044]
Deep learning models often struggle when deployed in real-world settings due to distribution shifts between training and test data.<n>We present Idempotent Test-Time Training (IT$3$), a novel approach that enables on-the-fly adaptation to distribution shifts using only the current test instance.<n>Our results suggest that idempotence provides a universal principle for test-time adaptation that generalizes across domains and architectures.
arXiv Detail & Related papers (2024-10-05T15:39:51Z) - CST: Calibration Side-Tuning for Parameter and Memory Efficient Transfer
Learning [4.776619551860301]
This paper introduces a lightweight fine-tuning strategy called side tuning.
It incorporates aspects of adapter tuning and side tuning to adapt the successful techniques employed in transformers for use with ResNet.
The paper has conducted an analysis on multiple fine-tuning strategies and have implemented their application within ResNet.
arXiv Detail & Related papers (2024-02-20T06:01:31Z) - pSTarC: Pseudo Source Guided Target Clustering for Fully Test-Time
Adaptation [15.621092104244003]
Test Time Adaptation (TTA) is a pivotal concept in machine learning, enabling models to perform well in real-world scenarios.
We propose a novel approach called pseudo Source guided Target Clustering (pSTarC) addressing the relatively unexplored area of TTA under real-world domain shifts.
arXiv Detail & Related papers (2023-09-02T07:13:47Z) - Benchmarking Test-Time Adaptation against Distribution Shifts in Image
Classification [77.0114672086012]
Test-time adaptation (TTA) is a technique aimed at enhancing the generalization performance of models by leveraging unlabeled samples solely during prediction.
We present a benchmark that systematically evaluates 13 prominent TTA methods and their variants on five widely used image classification datasets.
arXiv Detail & Related papers (2023-07-06T16:59:53Z) - Improved Test-Time Adaptation for Domain Generalization [48.239665441875374]
Test-time training (TTT) adapts the learned model with test data.
This work addresses two main factors: selecting an appropriate auxiliary TTT task for updating and identifying reliable parameters to update during the test phase.
We introduce additional adaptive parameters for the trained model, and we suggest only updating the adaptive parameters during the test phase.
arXiv Detail & Related papers (2023-04-10T10:12:38Z) - Revisiting Realistic Test-Time Training: Sequential Inference and
Adaptation by Anchored Clustering Regularized Self-Training [37.75537703971045]
We develop a test-time anchored clustering (TTAC) approach to enable stronger test-time feature learning.
Self-training(ST) has demonstrated great success in learning from unlabeled data.
TTAC++ consistently outperforms the state-of-the-art methods on five TTT datasets.
arXiv Detail & Related papers (2023-03-20T04:30:18Z) - Revisiting Realistic Test-Time Training: Sequential Inference and
Adaptation by Anchored Clustering [37.76664203157892]
We develop a test-time anchored clustering (TTAC) approach to enable stronger test-time feature learning.
TTAC discovers clusters in both source and target domain and match the target clusters to the source ones to improve generalization.
We demonstrate that under all TTT protocols TTAC consistently outperforms the state-of-the-art methods on five TTT datasets.
arXiv Detail & Related papers (2022-06-06T16:23:05Z) - Listen, Adapt, Better WER: Source-free Single-utterance Test-time
Adaptation for Automatic Speech Recognition [65.84978547406753]
Test-time Adaptation aims to adapt the model trained on source domains to yield better predictions for test samples.
Single-Utterance Test-time Adaptation (SUTA) is the first TTA study in speech area to our best knowledge.
arXiv Detail & Related papers (2022-03-27T06:38:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.