Fugu-MT 論文翻訳(概要): Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning

論文の概要: Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning

arxiv url: http://arxiv.org/abs/2505.17813v1
Date: Fri, 23 May 2025 12:29:06 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-26 18:08:34.062758
Title: Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning
Title（参考訳）: 考え直すな。LLM推論の改善のための短い思考連鎖を優先する
Authors: Michael Hassid, Gabriel Synnaeve, Yossi Adi, Roy Schwartz,
Abstract要約: 大規模言語モデル(LLM)の推論は、複雑な推論タスクを実行するためにテスト時間計算のスケーリングに依存する。個々の質問における短い推論連鎖が、正しい回答をもたらす可能性が著しく高いことを実証する。次に、短いトレーニングがパフォーマンスの向上につながることを観察します。
参考スコア（独自算出の注目度）: 45.807019099421225
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Reasoning large language models (LLMs) heavily rely on scaling test-time compute to perform complex reasoning tasks by generating extensive "thinking" chains. While demonstrating impressive results, this approach incurs significant computational costs and inference time. In this work, we challenge the assumption that long thinking chains results in better reasoning capabilities. We first demonstrate that shorter reasoning chains within individual questions are significantly more likely to yield correct answers - up to 34.5% more accurate than the longest chain sampled for the same question. Based on these results, we suggest short-m@k, a novel reasoning LLM inference method. Our method executes k independent generations in parallel and halts computation once the first m thinking processes are done. The final answer is chosen using majority voting among these m chains. Basic short-1@k demonstrates similar or even superior performance over standard majority voting in low-compute settings - using up to 40% fewer thinking tokens. short-3@k, while slightly less efficient than short-1@k, consistently surpasses majority voting across all compute budgets, while still being substantially faster (up to 33% wall time reduction). Inspired by our results, we finetune an LLM using short, long, and randomly selected reasoning chains. We then observe that training on the shorter ones leads to better performance. Our findings suggest rethinking current methods of test-time compute in reasoning LLMs, emphasizing that longer "thinking" does not necessarily translate to improved performance and can, counter-intuitively, lead to degraded results.
Abstract（参考訳）: 大規模言語モデル(LLM)の推論は、大規模な"思考"チェーンを生成することで複雑な推論タスクを実行するために、テスト時間計算のスケーリングに大きく依存している。素晴らしい結果が得られた一方で、このアプローチは計算コストと推論時間に大きな影響を与えている。この研究では、長い思考連鎖がより良い推論能力をもたらすという仮定に挑戦する。我々はまず、個々の質問における短い推論連鎖が正しい答えを得る確率が著しく高く、同じ質問でサンプリングされた最も長い連鎖よりも最大34.5%正確であることを示した。これらの結果に基づき,新しい推論法であるショートm@kを提案する。本手法は,k個の独立世代を並列に実行し,最初のm個の思考プロセスが完了すると計算を停止する。最後の答えは、これらのmチェーンの多数決によって選ばれる。基本的なshort-1@kは、低計算量設定での標準多数決よりも、最大40%少ない思考トークンを使用して、類似またはさらに優れたパフォーマンスを示す。 short-3@k は short-1@k よりもわずかに効率が劣るが、一貫して全ての計算予算で過半数の投票を上回っている。結果から着想を得て,短い,長い,ランダムに選択された推論鎖を用いてLLMを微調整した。次に、短いトレーニングがパフォーマンスの向上につながることを観察します。この結果から,LLMの推理におけるテスト時間計算の現在の手法の再考が示唆され,より長い「思考」が必ずしも性能向上に寄与せず,非意図的に結果の劣化につながることが示唆された。

論文の概要: Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning

関連論文リスト