Fugu-MT 論文翻訳(概要): Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning

論文の概要: Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning

arxiv url: http://arxiv.org/abs/2604.23623v1
Date: Sun, 26 Apr 2026 09:33:51 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 17:12:07.467346
Title: Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning
Title（参考訳）: Tandem: 効率的な推論のために,大規模かつ小規模な言語モデルと併用する
Authors: Zichuan Fu, Xian Wu, Guojing Li, Yejing Wang, Yijun Chen, Zihao Zhao, Yixuan Luo, Hanyu Yan, Yefeng Zheng, Xiangyu Zhao,
Abstract要約: 計算コストを大幅に削減した高品質な推論を実現するために,タンデムを提案する。このフレームワークは、大小の言語モデル(LLMとSLM)を相乗化し、高品質な推論を実現する。数学的推論とコード生成ベンチマークの実験により、タンデムは計算コストを約40%削減することを示した。
参考スコア（独自算出の注目度）: 37.624319973066925
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in large language models (LLMs) have catalyzed the rise of reasoning-intensive inference paradigms, where models perform explicit step-by-step reasoning before generating final answers. While such approaches improve answer quality and interpretability, they incur substantial computational overhead due to the prolonged generation sequences. In this paper, we propose Tandem, a novel collaborative framework that synergizes large and small language models (LLMs and SLMs) to achieve high-quality reasoning with significantly reduced computational cost. Specifically, the LLM serves as a strategic coordinator, efficiently generating a compact set of critical reasoning insights. These insights are then used to guide a smaller, more efficient SLM in executing the full reasoning process and delivering the final response. To balance efficiency and reliability, Tandem introduces a cost-aware termination mechanism that adaptively determines when sufficient reasoning guidance has been accumulated, enabling early stopping of the LLM's generation. Experiments on mathematical reasoning and code generation benchmarks demonstrate that Tandem reduces computational costs by approximately 40% compared to standalone LLM reasoning, while achieving superior or competitive performance. Furthermore, the sufficiency classifier trained on one domain transfers effectively to others without retraining. The code is available at: https://github.com/Applied-Machine-Learning-Lab/ACL2026_Tandem.
Abstract（参考訳）: 大規模言語モデル(LLM)の最近の進歩は推論集約的推論パラダイムの台頭を触媒し、モデルが最終的な答えを生成する前にステップバイステップの推論を行う。このようなアプローチは、応答品質と解釈可能性を改善するが、長い生成シーケンスのため、かなりの計算オーバーヘッドを発生させる。本稿では,大規模・小言語モデル(LLMとSLM)を相乗化して,計算コストを大幅に削減した高品質な推論を実現する,新しい協調フレームワークであるTandemを提案する。具体的には、LSMは戦略コーディネータとして機能し、批判的推論の洞察のコンパクトなセットを効率的に生成する。これらの洞察は、完全な推論プロセスを実行し、最終的な応答を提供する際に、より小さく、より効率的なSLMを導くために使用されます。効率性と信頼性のバランスをとるため、Tandemは、十分な推論ガイダンスが蓄積されたときに適応的に決定するコスト対応の終了メカニズムを導入し、LCMの生成を早期に停止することを可能にする。数学的推論とコード生成ベンチマークの実験では、タンデムはスタンドアロンのLCM推論と比較して計算コストを約40%削減し、優れた性能や競争性能を実現している。さらに、あるドメインの転送で訓練された十分分類器は、再訓練することなく、効果的に他への転送を行うことができる。コードは、https://github.com/Applied-Machine-Learning-Lab/ACL2026_Tandemで入手できる。

論文の概要: Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning

関連論文リスト