Fugu-MT 論文翻訳(概要): Select to Think: Unlocking SLM Potential with Local Sufficiency

論文の概要: Select to Think: Unlocking SLM Potential with Local Sufficiency

arxiv url: http://arxiv.org/abs/2604.26940v1
Date: Wed, 29 Apr 2026 17:51:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-30 15:59:36.530853
Title: Select to Think: Unlocking SLM Potential with Local Sufficiency
Title（参考訳）: Select to think: Unlocking SLM potential with Local Sufficiency
Authors: Wenxuan Ye, Yangyang Zhang, Xueli An, Georg Carle, Yunpu Ma,
Abstract要約: 小規模言語モデル(SLM)は、スケーラブルなデプロイメントのための計算効率を提供するが、より大きな言語モデル(LLM)によって示される推論能力に欠けることが多い。本稿では,SELECT TO THINK(S2T)を提案する。SLMに選択ロジックを蒸留し,推論時間に依存しない自律的な再ランク付けを実現する。
参考スコア（独自算出の注目度）: 12.573615247126204
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Small language models (SLMs) offer computational efficiency for scalable deployment, yet they often fall short of the reasoning power exhibited by their larger counterparts (LLMs). To mitigate this gap, current approaches invoke an LLM to generate tokens at points of reasoning divergence, but these external calls introduce substantial latency and costs. Alternatively, standard distillation is often hindered by the capacity limitation, as SLMs struggle to accurately mimic the LLM's complex generative distribution. We address this dilemma by identifying local sufficiency: at divergence points, the LLM's preferred token consistently resides within the SLM's top-K next-token predictions, even when failing to emerge as the SLM top-1 choice. We therefore propose SELECT TO THINK (S2T), which reframes the LLM's role from open-ended generation to selection among the SLM's proposals, simplifying the supervision signal to discrete candidate rankings. Leveraging this, we introduce S2T-LOCAL, which distills the selection logic into the SLM, empowering it to perform autonomous re-ranking without inference-time LLM dependency. Empirically, we demonstrate that a 1.5B SLM's top-8 candidates capture the 32B LLM's choice with 95% hit rate. Translating this potential into performance, S2T-LOCAL improves greedy decoding by 24.1% on average across benchmarks, effectively matching the efficacy of 8-path self-consistency while operating with single-trajectory efficiency.
Abstract（参考訳）: 小規模言語モデル(SLM)は、スケーラブルなデプロイメントのための計算効率を提供するが、より大きな言語モデル(LLM)によって示される推論能力に欠けることが多い。このギャップを軽減するため、現在のアプローチではLLMを呼び出し、推論の分岐点でトークンを生成するが、これらの外部呼び出しにはかなりのレイテンシとコストが伴う。あるいは、SLMがLSMの複雑な生成分布を正確に模倣するのに苦労しているため、標準的な蒸留は容量制限によってしばしば妨げられる。発散点において、LSMの好ましいトークンは、SLMのトップ1選択として現れない場合でも、SLMの次のトップK予測内に一貫して存在する。そこで我々は,SELECT TO THINK (S2T) を提案し,LLMの役割をオープン・エンド・ジェネレーションからSLMの候補選択に再編成し,個別の候補ランキングへの監視信号を簡素化する。これを利用して、S2T-LOCALを導入し、選択ロジックをSLMに蒸留し、推論時間 LLM 依存なしに自律的な再ランク付けを行う。実験により,1.5B の SLM 上位8候補が 32B の LLM 選択を95% のヒット率で捉えた。この可能性を性能に翻訳することで、S2T-LOCALは、ベンチマーク全体で平均24.1%のグレディデコーディングを改善し、単一軌道効率で動作しながら、8パスの自己整合性の有効性を効果的にマッチングする。

論文の概要: Select to Think: Unlocking SLM Potential with Local Sufficiency

関連論文リスト