Fugu-MT 論文翻訳(概要): Reasoning Topology Matters: Network-of-Thought for Complex Reasoning Tasks

論文の概要: Reasoning Topology Matters: Network-of-Thought for Complex Reasoning Tasks

arxiv url: http://arxiv.org/abs/2603.20730v1
Date: Sat, 21 Mar 2026 09:32:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-24 19:11:39.060759
Title: Reasoning Topology Matters: Network-of-Thought for Complex Reasoning Tasks
Title（参考訳）: Reasoning Topology Matters: Network-of-Thought for Complex Reasoning Tasks
Authors: Fan Huang,
Abstract要約: CoT(Chain-of-Thought)は線形トレースを生成し、ToT(Tree-of-Thought)は分岐探索を実行する。タイプノードとエッジを持つ有向グラフとして推論をモデル化するフレームワークであるNetwork-of-Thought (NoT)を提案する。
参考スコア（独自算出の注目度）: 5.523132953818281
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing prompting paradigms structure LLM reasoning in limited topologies: Chain-of-Thought (CoT) produces linear traces, while Tree-of-Thought (ToT) performs branching search. Yet complex reasoning often requires merging intermediate results, revisiting hypotheses, and integrating evidence from multiple sources. We propose Network-of-Thought (NoT), a framework that models reasoning as a directed graph with typed nodes and edges, guided by a heuristic-based controller policy. Across four benchmarks (GSM8K, Game of 24, HotpotQA, ProofWriter) and three models (GPT-4o-mini, Llama-3.3-70B-Instruct, Qwen2.5-72B-Instruct), we investigate when network topology outperforms chain or tree structures, whether LLM-generated heuristics can guide graph-based reasoning search, and the computation-accuracy tradeoff across topologies, evaluating each method on accuracy, topology simplicity, and token efficiency. Our results show that CoT remains effective for sequential tasks with GPT-4o-mini (89.5\% on GSM8K), while NoT surpasses ToT on multi-hop reasoning (91.0\% vs.\ 88.0\% on HotpotQA with LLM-as-Judge). With 72B open-source models, NoT achieves the highest accuracy on GSM8K (91.5\%), and Qwen2.5-72B achieves the best multi-hop QA result overall (91.7\% on HotpotQA). Self-generated controller heuristics outperform fixed and random strategies on logical reasoning, with uncertainty-only weighting achieving 57.0\% on ProofWriter. We also find that evaluation methodology significantly impacts method rankings: string-match underestimates all methods on open-ended QA, with the largest gap for NoT, a pattern consistent across all three models (14--18 percentage point gap on HotpotQA).
Abstract（参考訳）: CoT(Chain-of-Thought)は線形トレースを生成し、ToT(Tree-of-Thought)は分岐探索を実行する。しかし複雑な推論は、しばしば中間結果の融合、仮説の再検討、複数の情報源からの証拠の統合を必要とする。我々は,階層型ノードとエッジを持つ有向グラフとして推論をモデル化するフレームワークであるNetwork-of-Thought (NoT)を提案する。 4つのベンチマーク(GSM8K, Game of 24 HotpotQA, ProofWriter)と3つのモデル(GPT-4o-mini, Llama-3.3-70B-Instruct, Qwen2.5-72B-Instruct, ネットワークトポロジがチェーンやツリー構造より優れているか, LLM生成ヒューリスティックスがグラフベースの推論探索をガイドできるか, およびトポロジ間の計算精度のトレードオフについて検討し, 各手法の精度, トポロジ単純性, トークン効率を評価する。以上の結果から,CoTはGPT-4o-mini(GSM8Kでは89.5\%)の逐次処理に有効であり,NoTはマルチホップ推論(91.0\%)でToTを上回っていることがわかった。 LLM-as-Judge による HotpotQA の 88.0\% である。 72Bのオープンソースモデルでは、NoTはGSM8K(91.5\%)で最高精度を達成し、Qwen2.5-72Bは全体として最高のマルチホップQA(HotpotQAでは91.7\%)を達成した。自己生成コントローラヒューリスティックスは、論理的推論における固定的およびランダムな戦略よりも優れており、不確実性のみの重み付けはProofWriter上で57.0\%に達する。文字列マッチはオープンエンドのQAにおいて全てのメソッドを過小評価しており、NoTの最大のギャップは、3つのモデル(HotpotQAでは14-18のポイントギャップ)で一貫したパターンである。

論文の概要: Reasoning Topology Matters: Network-of-Thought for Complex Reasoning Tasks

関連論文リスト