Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph
- URL: http://arxiv.org/abs/2511.00086v1
- Date: Wed, 29 Oct 2025 22:14:25 GMT
- Title: Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph
- Authors: Fali Wang, Jihai Chen, Shuhua Yang, Runxue Bao, Tianxiang Zhao, Zhiwei Zhang, Xianfeng Tang, Hui Liu, Qi He, Suhang Wang,
- Abstract summary: Test-Time Scaling (TTS) improves large language models (LLMs) by allocating additional computation during inference.<n>We formalize it as a multi-LLM collaboration graph, where nodes encode roles and model assignments, edges capture information flow.<n>We propose Agent-REINFORCE, an LLM-agent-augmented framework that mirrors the REINFORCE pipeline by mapping sampling-gradient-update to sampling-feedback-update.
- Score: 42.247964605609745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Test-Time Scaling (TTS) improves large language models (LLMs) by allocating additional computation during inference, typically through parallel, sequential, or hybrid scaling. However, prior studies often assume fixed collaboration architectures (e.g., topologies) and single-model usage, overlooking that optimal architectures and model combinations can vary across tasks. Therefore, we study the novel problem of searching for compute-optimal model combinations and architectures in TTS under a fixed budget. We formalize it as a multi-LLM collaboration graph, where nodes encode roles and LLM model assignments, and edges capture information flow. This problem is challenging because (i) the combinatorial search space is prohibitively large, and (ii) task-specific requirements demand tailored designs. To address these, we reformulate the problem as probabilistic graph optimization and, through pilot experiments, derive three empirical insights into TTS collaboration graphs. Guided by these insights, we propose Agent-REINFORCE, an LLM-agent-augmented framework that mirrors the REINFORCE pipeline by mapping sampling-gradient-update to sampling-feedback-update, where feedback serves as a textual gradient to update the probabilistic graph and efficiently search for optimal multi-LLM collaboration graphs. Experiments show that Agent-REINFORCE outperforms both traditional and LLM-based baselines in sample efficiency and search performance, and effectively identifies optimal graphs under joint objectives of accuracy and inference latency.
Related papers
- OFA-MAS: One-for-All Multi-Agent System Topology Design based on Mixture-of-Experts Graph Generative Models [57.94189874119267]
Multi-Agent Systems (MAS) offer a powerful paradigm for solving complex problems.<n>Current graph learning-based design methodologies often adhere to a "one-for-one" paradigm.<n>We propose OFA-TAD, a one-for-all framework that generates adaptive collaboration graphs for any task described in natural language.
arXiv Detail & Related papers (2026-01-19T12:23:44Z) - GRIP: In-Parameter Graph Reasoning through Fine-Tuning Large Language Models [40.5886835740214]
Large Language Models (LLMs) have demonstrated remarkable capabilities in modeling sequential textual data.<n>We propose GRIP, a novel framework that equips LLMs with the ability to internalize complex relational information from graphs.<n>This knowledge is efficiently stored within lightweight LoRA parameters, enabling the fine-tuned LLM to perform a wide range of graph-related tasks.
arXiv Detail & Related papers (2025-11-06T21:56:58Z) - ChartM$^3$: A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension [15.798942458550515]
This study proposes an automated multi-stage code-driven pipeline for generating visual reasoning datasets.<n>We construct ChartM$3$, a multi-dimensional and multi-step dataset containing 38K charts and 142K Q&A pairs for training, along with 2,871 high-quality evaluation samples.
arXiv Detail & Related papers (2025-11-04T09:45:34Z) - GAP: Graph-Based Agent Planning with Parallel Tool Use and Reinforcement Learning [20.75113227786218]
Graph-based Agent Planning (GAP) is a novel framework that explicitly models inter-task dependencies through graph-based planning.<n>Our approach trains agent foundation models to decompose complex tasks into dependency-aware sub-task graphs.<n>This dependency-aware orchestration achieves substantial improvements in both execution efficiency and task accuracy.
arXiv Detail & Related papers (2025-10-29T09:35:55Z) - Dynamic Generation of Multi-LLM Agents Communication Topologies with Graph Diffusion Models [99.85131798240808]
We introduce a novel generative framework called textitGuided Topology Diffusion (GTD)<n>Inspired by conditional discrete graph diffusion models, GTD formulates topology synthesis as an iterative construction process.<n>At each step, the generation is steered by a lightweight proxy model that predicts multi-objective rewards.<n>Experiments show that GTD can generate highly task-adaptive, sparse, and efficient communication topologies.
arXiv Detail & Related papers (2025-10-09T05:28:28Z) - Rel-HNN: Split Parallel Hypergraph Neural Network for Learning on Relational Databases [3.6423651166048874]
Flattening the database poses challenges for deep learning models.<n>We propose a novel hypergraph-based framework, that we call rel-HNN.<n>We show that rel-HNN significantly outperforms existing methods in both classification and regression tasks.
arXiv Detail & Related papers (2025-07-16T18:20:45Z) - Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring.
Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations.
Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z) - In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement [71.60563181678323]
Large language models (LLMs) have achieved great success across diverse tasks, and fine-tuning is sometimes needed to further enhance generation quality.<n>To handle these challenges, a direct solution is to generate high-confidence'' data from unsupervised downstream tasks.<n>We propose a novel approach, pseudo-supervised demonstrations aligned prompt optimization (PAPO) algorithm, which jointly refines both the prompt and the overall pseudo-supervision.
arXiv Detail & Related papers (2024-10-04T03:39:28Z) - RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching)
To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth.
We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z) - Arch-Graph: Acyclic Architecture Relation Predictor for
Task-Transferable Neural Architecture Search [96.31315520244605]
Arch-Graph is a transferable NAS method that predicts task-specific optimal architectures.
We show Arch-Graph's transferability and high sample efficiency across numerous tasks.
It is able to find top 0.16% and 0.29% architectures on average on two search spaces under the budget of only 50 models.
arXiv Detail & Related papers (2022-04-12T16:46:06Z) - Pretrained Cost Model for Distributed Constraint Optimization Problems [37.79733538931925]
Distributed Constraint Optimization Problems (DCOPs) are an important subclass of optimization problems.
We propose a novel directed acyclic graph schema representation for DCOPs and leverage the Graph Attention Networks (GATs) to embed graph representations.
Our model, GAT-PCM, is then pretrained with optimally labelled data in an offline manner, so as to boost a broad range of DCOP algorithms.
arXiv Detail & Related papers (2021-12-08T09:24:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.