Fugu-MT 論文翻訳(概要): CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision

論文の概要: CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision

arxiv url: http://arxiv.org/abs/2505.15927v1
Date: Wed, 21 May 2025 18:28:54 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-23 17:12:47.863691
Title: CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision
Title（参考訳）: CoT情報: チェーン・オブ・ソート・スーパービジョンの下でサンプルの複雑さが改善
Authors: Awni Altabaa, Omar Montasser, John Lafferty,
Abstract要約: チェーン・オブ・思想(CoT)の監督は、最終的なアウトプットとともに中間的推論ステップを提供する。本稿では,CoT監督下での学習の統計的理論について述べる。
参考スコア（独自算出の注目度）: 10.29575414214269
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning complex functions that involve multi-step reasoning poses a significant challenge for standard supervised learning from input-output examples. Chain-of-thought (CoT) supervision, which provides intermediate reasoning steps together with the final output, has emerged as a powerful empirical technique, underpinning much of the recent progress in the reasoning capabilities of large language models. This paper develops a statistical theory of learning under CoT supervision. A key characteristic of the CoT setting, in contrast to standard supervision, is the mismatch between the training objective (CoT risk) and the test objective (end-to-end risk). A central part of our analysis, distinguished from prior work, is explicitly linking those two types of risk to achieve sharper sample complexity bounds. This is achieved via the *CoT information measure* $\mathcal{I}_{\mathcal{D}, h_\star}^{\mathrm{CoT}}(\epsilon; \calH)$, which quantifies the additional discriminative power gained from observing the reasoning process. The main theoretical results demonstrate how CoT supervision can yield significantly faster learning rates compared to standard E2E supervision. Specifically, it is shown that the sample complexity required to achieve a target E2E error $\epsilon$ scales as $d/\mathcal{I}_{\mathcal{D}, h_\star}^{\mathrm{CoT}}(\epsilon; \calH)$, where $d$ is a measure of hypothesis class complexity, which can be much faster than standard $d/\epsilon$ rates. Information-theoretic lower bounds in terms of the CoT information are also obtained. Together, these results suggest that CoT information is a fundamental measure of statistical complexity for learning under chain-of-thought supervision.
Abstract（参考訳）: 多段階推論を含む複雑な関数の学習は、インプット・アウトプットの例から標準教師付き学習において重要な課題となる。最終結果とともに中間推論ステップを提供するChain-of-Thought(CoT)監督は、大規模な言語モデルの推論能力の最近の進歩を支える強力な実証的手法として登場した。本稿では,CoT監督下での学習の統計的理論について述べる。 CoT設定の重要な特徴は、標準的な監督とは対照的に、トレーニング目標(CoTリスク)とテスト目標(エンドツーエンドリスク)のミスマッチである。これまでの研究と異なり、分析の中心的な部分は、これらの2種類のリスクを明示的にリンクして、よりシャープなサンプル複雑性境界を達成することです。これは *CoT 情報測度 * $\mathcal{I}_{\mathcal{D}, h_\star}^{\mathrm{CoT}}(\epsilon; \calH)$ によって達成される。主要な理論的結果は、CoTの監督が標準のE2Eの監督よりもはるかに高速な学習率を得ることを示す。具体的には、ターゲットのE2Eエラーを達成するのに必要なサンプルの複雑さを$\epsilon$ scales as $d/\mathcal{I}_{\mathcal{D}, h_\star}^{\mathrm{CoT}}(\epsilon; \calH)$, where $d$は仮説クラスの複雑性の尺度であり、標準の$d/\epsilon$ rateよりもはるかに高速である。また、CoT情報の観点からの情報理論下限も取得する。これらの結果から,CoT情報はチェーン・オブ・インテリジェンスの下での学習において,統計的複雑さの基本的な尺度であることが示唆された。

関連論文リスト

Reinforced Latent Reasoning for LLM-based Recommendation [83.18146814163308]
大きな言語モデル(LLM)は、複雑な問題解決タスクにおいて印象的な推論能力を示している。既存の手法は通常、明示的なチェーン・オブ・シント(CoT)データによる微調整に依存している。本研究では, 明示的なCoT推論から, コンパクトで情報密度の高い潜伏推論へ移行する代替手法について検討する。
論文参考訳（メタデータ） (2025-05-25T11:03:45Z)
When More is Less: Understanding Chain-of-Thought Length in LLMs [51.631483479081645]
大規模言語モデル(LLM)は複雑な問題を分解するためにChain-of-Thought(CoT)推論を用いる。本稿は、長いCoTがより優れていると仮定されることがしばしばあり、長いCoTが常に優れているとは限らない、と論じる。
論文参考訳（メタデータ） (2025-02-11T05:28:59Z)
Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits [49.96531901205305]
我々は$f$-divergence-regularized offline policy learningを分析する。逆Kullback-Leibler (KL) の発散に対して、単極集中性の下での最初の$tildeO(epsilon-1)$サンプル複雑性を与える。これらの結果は,$f$-divergence-regularized policy learningの包括的理解に向けて大きな一歩を踏み出したものと考えられる。
論文参考訳（メタデータ） (2025-02-09T22:14:45Z)
Unveiling the Mechanisms of Explicit CoT Training: How CoT Enhances Reasoning Generalization [9.191236388401226]
大規模言語モデルの学習における明示的連鎖理論(CoT)の統合は、その推論能力を向上させるが、CoTが一般化を強化するメカニズムはいまだよく理解されていない。本研究は,(1)テキストショーCoTトレーニングが内部モデル表現を再評価し,(2)テキストトウハウは分布内(ID)と分布外(OOD)の両方を一般化する。
論文参考訳（メタデータ） (2025-02-07T05:21:13Z)
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency [17.612497960364916]
CoT(Chain-of-Thought)は大規模言語モデル(LLM)の推論性能を著しく向上させる代表電力が十分である場合でも,CoTは試料効率を大幅に向上できることを示す。 CoTは入力トークン間のスパース依存関係を導入して学習プロセスを単純化し、スパースかつ解釈可能な注意を喚起することを示す。
論文参考訳（メタデータ） (2024-10-07T19:45:09Z)
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis [82.51626700527837]
チェーン・オブ・シフト(Chain-of-shift, CoT)は、複数の中間ステップを持つ例を用いてクエリを増強することにより、大規模言語モデルの推論能力を実現する効率的な手法である。 CoT の理論的成功にもかかわらず、CoT が成立しても正確な一般化が得られないことを示す。
論文参考訳（メタデータ） (2024-10-03T03:12:51Z)
Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods [59.779795063072655]
CoT(Chain-of-Thought)の促進とその変種は、多段階推論問題を解決する効果的な方法として人気を集めている。統計的推定の観点からCoTのプロンプトを解析し,その複雑さを包括的に評価する。
論文参考訳（メタデータ） (2024-08-25T04:07:18Z)
On the Provable Advantage of Unsupervised Pretraining [26.065736182939222]
教師なし事前学習は、現代の大規模機械学習システムにおいて重要な要素である。本稿では、教師なし表現学習タスクを潜在変数モデルの抽象クラスで指定する汎用フレームワークについて検討する。軽度の'informative'条件下では、下流タスクに対して$tildemathcalO(sqrtmathcalC_Phi/m + sqrtmathcalC_Psi/n)$の過剰なリスクを達成する。
論文参考訳（メタデータ） (2023-03-02T20:42:05Z)
A Characterization of Semi-Supervised Adversarially-Robust PAC Learnability [57.502573663108535]
本研究では、半教師付きPACモデルにおいて、時間攻撃をテストするために、逆向きに頑健な予測器を学習する問題について検討する。最悪の分布自由モデルにおいても,半教師付き頑健な学習には大きなメリットがあることが示されている。
論文参考訳（メタデータ） (2022-02-11T03:01:45Z)
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces [208.67848059021915]
強化学習のコアにおける探索・探索トレードオフについて検討する。特に、関数クラス $mathcalF$ の複雑さが関数の複雑さを特徴づけていることを証明する。私たちの後悔の限界はエピソードの数とは無関係です。
論文参考訳（メタデータ） (2020-11-09T18:32:22Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。