Fugu-MT 論文翻訳(概要): Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting

論文の概要: Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting

arxiv url: http://arxiv.org/abs/2505.19716v1
Date: Mon, 26 May 2025 09:04:44 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-27 16:58:43.30794
Title: Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting
Title（参考訳）: 簡潔な推論とビッグゲイン:難易度対応プロンプトによる長い推論トレースの抽出
Authors: Yifan Wu, Jingze Shi, Bingheng Wu, Jiayi Zhang, Xiaotian Lin, Nan Tang, Yuyu Luo,
Abstract要約: 本稿では,性能損失を伴わない推論トレースを動的に短縮するDAP法を提案する。実験では、難解なCoTサンプルの100Kだけを微調整した学生モデルが800KのLong CoTサンプルで蒸留されたモデルより優れている。また,本手法は,11種類の多種多様なベンチマークにおいて,比較的少ないトークンを用いて,長鎖よりも短い難易度CoTの精度を向上する。
参考スコア（独自算出の注目度）: 28.537281448659634
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing chain-of-thought (CoT) distillation methods can effectively transfer reasoning abilities to base models but suffer from two major limitations: excessive verbosity of reasoning traces and inadequate adaptability to problem difficulty. Long reasoning traces significantly increase inference costs, and uniform-length solutions prevent base models from learning adaptive reasoning strategies. To address these issues, we propose a difficulty-aware prompting (DAP) method to dynamically shorten reasoning traces without performance loss. In our approach, a large teacher model first judges each problem's difficulty and then rewrites its reasoning traces to an appropriate shorter length, yielding concise yet complete reasoning traces. Leveraging the DAP pipeline, we curate a distilled dataset called LiteCoT consisting of 100K concise reasoning examples, with solutions averaging only 720 tokens (an order of magnitude shorter than typical CoTs). Using LiteCoT, we distilled a new family of reasoning models called Liter (1.5B, 7B, and 32B) based on the Qwen2.5 architecture. Experiments show that a student model fine-tuned on just 100K of these difficulty-pruned CoT samples outperforms a model distilled on 800K original Long CoT samples, while significantly reducing training and inference costs. Our method also generalizes well: across 11 diverse benchmarks, the shorter difficulty-aware CoTs achieve equal or better accuracy than Long chains, using far fewer tokens. For example, on the challenging AIME24 exam, our approach reaches $74.2\%$ Pass@1 using only about 5K inference tokens, surpassing other methods that consume many more tokens. Our code and data are available at https://github.com/Evanwu1125/LiteCoT.
Abstract（参考訳）: 既存のチェーン・オブ・ソート(CoT)蒸留法は、推論能力をベースモデルに効果的に伝達するが、2つの大きな制限がある。ロング推論トレースは推論コストを大幅に増加させ、均一長のソリューションはベースモデルが適応推論戦略を学習するのを防ぐ。これらの問題に対処するため,性能損失を伴わない推論トレースを動的に短縮するDAP法を提案する。提案手法では,まず各問題の難易度を判断し,その推論トレースを適切な長さに書き直し,簡潔で完全な推論トレースを生成する。 DAPパイプラインを利用すると、100Kの簡潔な推論例からなるLiteCoTと呼ばれる蒸留データセットをキュレートします。また,LiteCoTを用いてQwen2.5アーキテクチャに基づくLiter (1.5B, 7B, 32B) と呼ばれる新しい推論モデルを蒸留した。実験の結果、これらの難解なCoTサンプルの100Kで微調整された学生モデルは、800KのLong CoTサンプルで蒸留されたモデルよりも優れており、トレーニングと推論のコストは大幅に削減されていることがわかった。また,本手法は,11種類の多種多様なベンチマークにおいて,比較的少ないトークンを用いて,長鎖よりも短い難易度CoTの精度を向上する。例えば、挑戦的なAIME24試験では、我々のアプローチは5Kの推論トークンのみを使用して74.2\%$ Pass@1に達し、多くのトークンを消費する他のメソッドを上回ります。私たちのコードとデータはhttps://github.com/Evanwu1125/LiteCoT.comで公開されています。

論文の概要: Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting

関連論文リスト