Related papers: Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute

Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute

URL: http://arxiv.org/abs/2504.00762v3
Date: Tue, 15 Apr 2025 06:58:14 GMT
Title: Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute
Authors: Jianhao Chen, Zishuo Xun, Bocheng Zhou, Han Qi, Qiaosheng Zhang, Yang Chen, Wei Hu, Yuzhong Qu, Wanli Ouyang, Shuyue Hu,
Abstract summary: This paper presents a simple, effective, and cost-efficient strategy to improve LLM performance by scaling test-time compute.<n>Our strategy builds upon the repeated-sampling-then-voting framework, with a novel twist: incorporating multiple models, even weaker ones, to leverage their complementary strengths.
Score: 55.330813919992465
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This paper presents a simple, effective, and cost-efficient strategy to improve LLM performance by scaling test-time compute. Our strategy builds upon the repeated-sampling-then-voting framework, with a novel twist: incorporating multiple models, even weaker ones, to leverage their complementary strengths that potentially arise from diverse training data and paradigms. By using consistency as a signal, our strategy dynamically switches between models. Theoretical analysis highlights the efficiency and performance advantages of our strategy. Extensive experiments on six datasets demonstrate that our strategy not only outperforms self-consistency and state-of-the-art multi-agent debate approaches, but also significantly reduces inference costs. Additionally, ModelSwitch requires only a few comparable LLMs to achieve optimal performance and can be extended with verification methods, demonstrating the potential of leveraging multiple LLMs in the generation-verification paradigm.

Related papers

FastMCTS: A Simple Sampling Strategy for Data Synthesis [67.60823802317141]
We introduce FastMCTS, an innovative data synthesis strategy inspired by Monte Carlo Tree Search.<n>FastMCTS provides a more efficient sampling method for multi-step reasoning data, offering step-level evaluation signals.<n>Experiments on both English and Chinese reasoning datasets demonstrate that FastMCTS generates over 30% more correct reasoning paths.
arXiv Detail & Related papers (2025-02-17T06:27:57Z)
Revisiting Robust RAG: Do We Still Need Complex Robust Training in the Era of Powerful LLMs? [69.38149239733994]
We investigate whether complex robust training strategies remain necessary as model capacity grows.<n>We find that as models become more powerful, the performance gains brought by complex robust training methods drop off dramatically.<n>Our findings suggest that RAG systems can benefit from simpler architectures and training strategies as models become more powerful.
arXiv Detail & Related papers (2025-02-17T03:34:31Z)
Words Matter: Leveraging Individual Text Embeddings for Code Generation in CLIP Test-Time Adaptation [21.20806568508201]
We show how to leverage class text information to mitigate distribution drifts encountered by vision-language models (VLMs) during test-time inference.<n>We propose to generate pseudo-labels for the test-time samples by exploiting generic class text embeddings as fixed centroids of a label assignment problem.<n>Experiments on multiple popular test-time adaptation benchmarks presenting diverse complexity empirically show the superiority of CLIP-OT.
arXiv Detail & Related papers (2024-11-26T00:15:37Z)
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning [33.754240030720425]
Large language models (LLMs) deployed as agents solve user-specified tasks over multiple steps while keeping the required manual engagement to a minimum.<n>We propose an end-to-end reinforcement learning method for teaching models to leverage execution feedback in the realm of code synthesis.
arXiv Detail & Related papers (2024-10-02T23:25:17Z)
CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models [68.64605538559312]
In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives. Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance. In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs.
arXiv Detail & Related papers (2024-07-29T23:18:55Z)
Multimodal Classification via Modal-Aware Interactive Enhancement [6.621745547882088]
We propose a novel multimodal learning method, called modal-aware interactive enhancement (MIE) Specifically, we first utilize an optimization strategy based on sharpness aware minimization (SAM) to smooth the learning objective during the forward phase. Then, with the help of the geometry property of SAM, we propose a gradient modification strategy to impose the influence between different modalities during the backward phase.
arXiv Detail & Related papers (2024-07-05T15:32:07Z)
Efficient Multi-agent Reinforcement Learning by Planning [33.51282615335009]
Multi-agent reinforcement learning (MARL) algorithms have accomplished remarkable breakthroughs in solving large-scale decision-making tasks. Most existing MARL algorithms are model-free, limiting sample efficiency and hindering their applicability in more challenging scenarios. We propose the MAZero algorithm, which combines a centralized model with Monte Carlo Tree Search (MCTS) for policy search.
arXiv Detail & Related papers (2024-05-20T04:36:02Z)
Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large Language Models [56.256069117502385]
Chain of Thought (CoT) approaches can be used to enhance the capability of Large Language Models (LLMs) on complex reasoning tasks. However, the selection of optimal CoT demonstration examples in multi-modal reasoning remains less explored. We introduce a novel approach that addresses this challenge by using retrieval mechanisms to automatically select demonstration examples.
arXiv Detail & Related papers (2023-12-04T08:07:21Z)
Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts [52.844741540236285]
This paper investigates the model-based methods in multi-agent reinforcement learning (MARL) We propose a novel decentralized model-based MARL method, named Adaptive Opponent-wise Rollout Policy (AORPO)
arXiv Detail & Related papers (2021-05-07T16:20:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.