Fugu-MT 論文翻訳(概要): Causal Direction from Convergence Time: Faster Training in the True Causal Direction

論文の概要: Causal Direction from Convergence Time: Faster Training in the True Causal Direction

arxiv url: http://arxiv.org/abs/2602.22254v1
Date: Tue, 24 Feb 2026 21:34:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-27 18:41:22.328617
Title: Causal Direction from Convergence Time: Faster Training in the True Causal Direction
Title（参考訳）: 収束時間からの因果方向:真の因果方向の高速トレーニング
Authors: Abdulrahman Tamim,
Abstract要約: 最適化力学に基づく因果方向同定の原理である因果計算非対称性(Causal Computational Asymmetric, CCA)を紹介する。 CCAは最適化時空間で動作し、RESIT、IGCI、SkewScoreなどの手法と区別する。さらに、グラフ構造学習、因果情報圧縮、ポリシー最適化を統合したCausal Compression Learning(CCL)というフレームワークにCCAを組み込む。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce Causal Computational Asymmetry (CCA), a principle for causal direction identification based on optimization dynamics in which one neural network is trained to predict $Y$ from $X$ and another to predict $X$ from $Y$, and the direction that converges faster is inferred to be causal. Under the additive noise model $Y = f(X) + \varepsilon$ with $\varepsilon \perp X$ and $f$ nonlinear and injective, we establish a formal asymmetry: in the reverse direction, residuals remain statistically dependent on the input regardless of approximation quality, inducing a strictly higher irreducible loss floor and non-separable gradient noise in the optimization dynamics, so that the reverse model requires strictly more gradient steps in expectation to reach any fixed loss threshold; consequently, the forward (causal) direction converges in fewer expected optimization steps. CCA operates in optimization-time space, distinguishing it from methods such as RESIT, IGCI, and SkewScore that rely on statistical independence or distributional asymmetries, and proper z-scoring of both variables is required for valid comparison of convergence rates. On synthetic benchmarks, CCA achieves 26/30 correct causal identifications across six neural architectures, including 30/30 on sine and exponential data-generating processes. We further embed CCA into a broader framework termed Causal Compression Learning (CCL), which integrates graph structure learning, causal information compression, and policy optimization, with all theoretical guarantees formally proved and empirically validated on synthetic datasets.
Abstract（参考訳）: 因果計算非対称性(Causal Computational Asymmetric, CCA)は、最適化力学に基づく因果方向同定の原理であり、あるニューラルネットワークが$X$から$Y$へ、別のニューラルネットワークが$X$から$Y$から予測するように訓練され、より高速に収束する方向が因果方向であると推定される。付加ノイズモデル $Y = f(X) + \varepsilon$ with $\varepsilon \perp X$ and $f$ 非線形かつ単射の形式的非対称性を確立する: 逆方向では、残差は近似品質に関係なく入力に統計的に依存し、最適化力学において厳密に高い既約損失フロアと非分離勾配ノイズを誘導する。 CCAは最適化時空間で動作し、統計独立性や分布非対称性に依存するRESIT、IGCI、SkewScoreなどの手法と区別する。合成ベンチマークでは、CCAは正弦および指数的データ生成プロセスの30/30を含む、6つのニューラルネットワークアーキテクチャにわたる26/30の正確な因果同定を達成する。さらに、CCAをグラフ構造学習、因果情報圧縮、ポリシー最適化といった、より広範なフレームワークであるCausal Compression Learning(CCL)に組み込む。

関連論文リスト

Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
プッシュベースの分散通信は、情報交換が非対称である可能性のある通信ネットワークの最適化を可能にする。我々は、グラディエント・プッシュ(SGP)アルゴリズムのための統一的な一様安定性フレームワークを開発する。重要な技術的要素は、2つの量に束縛された不均衡認識の一般化である。
論文参考訳（メタデータ） (2026-02-24T05:32:03Z)
Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration [56.074760766965085]
PRISMは、モデルの既存の知識との認知的対立度に基づいてデータを調停する動的認識フレームワークを実現する。この結果から,内部最適化方式に基づくデータ分離が,スケーラブルでロバストなエージェントアライメントに不可欠であることが示唆された。
論文参考訳（メタデータ） (2026-01-12T05:43:20Z)
Unifying Learning Dynamics and Generalization in Transformers Scaling Law [1.5229257192293202]
大規模言語モデル(LLM)開発の基盤であるスケーリング法則は,計算資源の増加に伴うモデル性能の向上を予測している。この研究は、変圧器に基づく言語モデルの学習力学を常微分方程式(ODE)システムとして定式化する。本分析では,データによる計算資源のスケールとして,一般化誤差と既約リスクの収束を特徴付ける。
論文参考訳（メタデータ） (2025-12-26T17:20:09Z)
Outlier-aware Tensor Robust Principal Component Analysis with Self-guided Data Augmentation [21.981038455329013]
適応重み付けを用いた自己誘導型データ拡張手法を提案する。本研究では,最先端手法と比較して精度と計算効率の両面での改善を示す。
論文参考訳（メタデータ） (2025-04-25T13:03:35Z)
Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
分散サーバ(DFL)はクライアント・クライアント・アーキテクチャへの依存をなくす。非滑らかな正規化はしばしば機械学習タスクに組み込まれる。本稿では,これらの問題を解決する新しいDNCFLアルゴリズムを提案する。
論文参考訳（メタデータ） (2025-04-17T08:32:25Z)
Implicit Bias and Fast Convergence Rates for Self-attention [26.766649949420746]
本稿では,変圧器の定義機構である自己注意の基本的な最適化原理について考察する。線形分類におけるデコーダを用いた自己アテンション層における勾配ベースの暗黙バイアスを解析する。
論文参考訳（メタデータ） (2024-02-08T15:15:09Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
本稿では,ニューラルネットワークトレーニングを安定化(大規模)するための原理的手法として,線形アヘッドの理論解析を提案する。最適化過程の不安定性は、しばしば損失ランドスケープの非単調性によって引き起こされるものであり、非拡張作用素の理論を活用することによって線型性がいかに役立つかを示す。
論文参考訳（メタデータ） (2023-10-20T12:45:12Z)
PROMISE: Preconditioned Stochastic Optimization Methods by Incorporating Scalable Curvature Estimates [17.777466668123886]
PROMISE ($textbfPr$econditioned $textbfO$ptimization $textbfM$ethods by $textbfI$ncorporating $textbfS$calable Curvature $textbfE$stimates)はスケッチベースの事前条件勾配アルゴリズムである。 PROMISEには、SVRG、SAGA、およびKatyushaのプレコンディション版が含まれている。
論文参考訳（メタデータ） (2023-09-05T07:49:10Z)
Optimizing Information-theoretical Generalization Bounds via Anisotropic Noise in SGLD [73.55632827932101]
SGLDにおけるノイズ構造を操作することにより,情報理論の一般化を最適化する。低経験的リスクを保証するために制約を課すことで、最適なノイズ共分散が期待される勾配共分散の平方根であることを証明する。
論文参考訳（メタデータ） (2021-10-26T15:02:27Z)
Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
高精度リコール曲線(AUPRC)に基づく領域の最適化について検討し,不均衡なタスクに広く利用されている。我々は、$O (1/epsilon4)$のより優れた反復による、$epsilon$定常解を見つけるための新しい運動量法を開発する。また,O(1/epsilon4)$と同じ複雑さを持つ適応手法の新たなファミリを設計し,実際により高速な収束を享受する。
論文参考訳（メタデータ） (2021-07-02T16:21:52Z)
A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
本稿では,分離可能なデータの強化に関する高精度な高次元理論を確立する。統計モデルのクラスでは、ブースティングの普遍性誤差を正確に解析する。また, 推力試験誤差と最適ベイズ誤差の関係を明示的に説明する。
論文参考訳（メタデータ） (2020-02-05T00:24:53Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。