Fugu-MT 論文翻訳(概要): Think How to Think: Mitigating Overthinking with Autonomous Difficulty Cognition in Large Reasoning Models

論文の概要: Think How to Think: Mitigating Overthinking with Autonomous Difficulty Cognition in Large Reasoning Models

arxiv url: http://arxiv.org/abs/2507.02663v1
Date: Thu, 03 Jul 2025 14:24:26 GMT
ステータス: 翻訳完了
システム内更新日: 2025-07-04 15:37:16.390659
Title: Think How to Think: Mitigating Overthinking with Autonomous Difficulty Cognition in Large Reasoning Models
Title（参考訳）: どのように考えるか:大規模推論モデルにおける自律的難易度認知による過度な思考の軽減
Authors: Yongjiang Liu, Haoxi Li, Xiaosong Ma, Jie Zhang, Song Guo,
Abstract要約: 本稿では,LRMの難易度認識と冗長性認知を段階的に刺激する,新しい2段階の微調整戦略であるThink-How-to-Think(TH2T)を提案する。 TH2Tは、性能安定性を維持しながら、推論コスト(簡単なタスクでは70%以上、難しいタスクでは40%以上)を著しく削減する。
参考スコア（独自算出の注目度）: 12.618562275265704
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent Long Reasoning Models(LRMs) have demonstrated remarkable capabilities in handling complex reasoning tasks, but are hindered by excessive overthinking. To explore its essence, our empirical analysis reveals that LRMs are primarily limited to recognizing task properties (i.e., difficulty levels) like humans before solving the problem, leading to a one-size-fits-all reasoning process. Inspired by this, a pressing and natural question emerges: Can we bootstrap such ability to further alleviate the overthinking phenomenon in LRMs? In this paper, we propose Think-How-to-Think (TH2T), a novel two-stage fine-tuning strategy that progressively inspires LRMs' difficulty cognition and redundancy cognition. First, we introduce difficulty-hypnosis in the prefixes of model outputs to intervene in the internal reasoning trajectory. Combined with a heterogeneous short and long reasoning dataset, the trained model enhances its sensitivity to task difficulty, enabling native, differentiated reasoning strategies across various tasks. Second, we further extend redundancy-hypnosis to the internal reasoning process, guiding the model to identify redundant structures within the reasoning steps and generate more concise reasoning outputs. Experiments on 7B/14B/32B models demonstrate that TH2T significantly reduces inference costs (more than 70% on easy tasks and 40% on hard tasks) while maintaining performance stability. The resulting outputs exhibit clear difficulty-aware capabilities and reduced redundancy (e.g., reflection).
Abstract（参考訳）: 近年のLong Reasoning Models (LRMs) は複雑な推論タスクの処理において顕著な能力を示しているが、過度な過度な過度な過度な考えによって妨げられている。その本質を探求するため、我々の経験的分析により、LEMは、その問題を解決する前に人間のようなタスク特性(すなわち難易度)を認識することに主に制限されていることが判明した。このことに触発されて、迫力のある自然な疑問が浮かび上がります。本稿では,LRMの難易度認識と冗長性認知を段階的に刺激する新しい2段階微調整戦略であるThink-How-to-Think(TH2T)を提案する。まず、モデル出力の接頭辞に困難催眠を導入し、内部推論の軌跡に介入する。不均一なショートおよびロング推論データセットと組み合わせることで、トレーニングされたモデルはタスクの難しさに対する感度を高め、さまざまなタスクにわたるネイティブで差別化された推論戦略を可能にする。第二に、冗長性催眠を内部推論プロセスに拡張し、推論ステップ内で冗長構造を識別し、より簡潔な推論出力を生成する。 7B/14B/32Bモデルの実験では、TH2Tは性能安定性を維持しつつ、推論コスト(簡単なタスクでは70%以上、難しいタスクでは40%以上)を著しく削減することを示した。その結果得られた出力は、明らかな困難を認識し、冗長性を減少させる(リフレクションなど)。

関連論文リスト

Towards Concise and Adaptive Thinking in Large Reasoning Models: A Survey [8.736170026262279]
OpenAI o1やDeepSeek R1のような大きな推論モデル(LRM)は、複雑な推論タスクにおいて印象的なパフォーマンスを示している。これらのモデルもまた、不要な長さと冗長な推論チェーンを生成するという大きな課題に直面しています。
論文参考訳（メタデータ） (2025-07-13T14:51:59Z)
Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement [101.77467538102924]
大きな推論モデル(LRM)は、効率を阻害し、推論コストを膨らませる過剰な考えを示す。 LRM効率を向上させるための2つの軽量手法を提案する。まず,学習不要なアクティベーションステアリング技術であるEfficic Steeringを導入する。第2に,タスクの正確さと簡潔さを動的にバランスする強化学習フレームワークである自己回帰効率RLを開発する。
論文参考訳（メタデータ） (2025-06-18T17:18:12Z)
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity [16.266145641151375]
大規模な推論モデルは、回答を提供する前に詳細な思考プロセスを生成する。我々は, LRM がある種の複雑さを超えて完全に精度の低下に直面していることを示す。また、より深く推論の痕跡を調べ、探索された解のパターンを研究する。
論文参考訳（メタデータ） (2025-06-07T22:42:29Z)
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation [33.008513399946914]
OThink-R1は論理的妥当性を保ちながら冗長な推論ステップを誘発する手法である。 OThink-R1は、数学的および質問応答タスクにわたる実験により、推論の冗長性を平均で約23%削減することを示した。
論文参考訳（メタデータ） (2025-06-03T03:31:30Z)
Revisiting Overthinking in Long Chain-of-Thought from the Perspective of Self-Doubt [74.35891434097053]
RLLM(Reasoning Large Language Models)は、複雑なタスクにおいて素晴らしいパフォーマンスを示す。彼らはしばしば過度に考え、正しい答えに達した後も不必要な推論ステップを実行します。本稿では,自己疑念の観点から,過剰思考を定量的に分析する。本稿では,入力問題に対するモデルの過度信頼度を低減するための,シンプルで効果的なプロンプト手法を提案する。
論文参考訳（メタデータ） (2025-05-29T14:30:02Z)
CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models [56.40065909544213]
大規模言語モデル(LLM)は、テスト時間スケーリングとして知られる、テスト時間計算の増加の恩恵を受ける。しかし、推論最適化モデルはしばしば単純な問題さえ考え過ぎ、過度に冗長な出力を生成し、トークン効率を低下させる。 1)強化学習は前方推論の情報密度を減少させ,(2)後方連鎖学習は冗長でしばしば不要な検証ステップを促進する。
論文参考訳（メタデータ） (2025-05-28T06:24:45Z)
When Can Large Reasoning Models Save Thinking? Mechanistic Analysis of Behavioral Divergence in Reasoning [19.329523111916682]
大規模推論モデル(LRM)は複雑なタスクにおいてかなり高度な性能を持つが、非効率性を導入する傾向にある。本研究では,Regress Learning (RL) 学習における内的メカニズムについて考察した。
論文参考訳（メタデータ） (2025-05-21T08:55:35Z)
Let LLMs Break Free from Overthinking via Self-Braking Tuning [60.08396797526657]
大きな推論モデル(LRM)は思考の長い連鎖を生成することによって推論能力を著しく向上させた。この性能向上は、生成プロセス中の冗長な推論を大幅に増加させるコストが伴う。本稿では、モデルが独自の推論プロセスを制御することを許容する観点から、過度に検討する新しいフレームワーク、Self-Braking Tuning(SBT)を提案する。
論文参考訳（メタデータ） (2025-05-20T16:53:40Z)
Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities [101.77467538102924]
近年のLRM(Large Reasoning Models)の進歩は、特殊推論タスクにおいて顕著な性能を示している。議論的推論能力の獲得は, LRMの基礎的能力を大幅に低下させることを示す。適応推論(Zero-Thinking, Less-Thinking, Summary-Thinking)がこれらの欠点を効果的に軽減できることを示します。
論文参考訳（メタデータ） (2025-03-23T08:18:51Z)
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks [96.27754404942364]
大規模推論モデル(LRM)は、AI問題解決能力の突破口となるが、インタラクティブ環境での有効性は制限される可能性がある。本稿では, LRMにおける過度な考察を紹介し, 分析する。解析的麻痺,ローグ行動,早期解離の3つのパターンを観察した。
論文参考訳（メタデータ） (2025-02-12T09:23:26Z)
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs [86.79757571440082]
OpenAIのo1のような大規模言語モデル(LLM)は複雑な推論タスクにおいて顕著な能力を示している。我々は、o1-like LLMがしばしば異なる理性的思考を切り替える、という現象を特定する。本稿では,思考間の早期移行を回避できる思考切替ペナルティTIPを用いた復号戦略を提案する。
論文参考訳（メタデータ） (2025-01-30T18:58:18Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。