Fugu-MT 論文翻訳(概要): Exact equivariance, kept through training, buys zero-shot generalisation across the symmetry group

論文の概要: Exact equivariance, kept through training, buys zero-shot generalisation across the symmetry group

arxiv url: http://arxiv.org/abs/2606.03003v1
Date: Tue, 02 Jun 2026 01:20:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-03 22:00:04.674378
Title: Exact equivariance, kept through training, buys zero-shot generalisation across the symmetry group
Title（参考訳）: 訓練を通して保たれるエクイティ同値は、対称性群を横断するゼロショット一般化を購入する
Authors: Hongbo Wang,
Abstract要約: 等変エンコーダ$E$と等変予測器$f$から構築された潜在世界モデルは、トレーニング損失の証明可能な対称性を継承する。このエンドツーエンドをラップトップスケール(CPU/MPS、完全シード)で検証する。
参考スコア（独自算出の注目度）: 6.230579198456525
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A latent world model built from an equivariant encoder $E$ and an equivariant predictor $f$ inherits a provable symmetry of its training loss: when the world's dynamics genuinely carries a group $G$ acting on latents by an orthogonal representation $ρ(g)$, the one-step prediction relMSE is exactly invariant across the whole group, so fitting the dynamics on a restricted slice of orientations mathematically determines it on the entire orbit (jǔ yī fǎn sān). We verify this end-to-end at laptop scale (CPU/MPS, fully seeded). [A] The symmetry survives a real Muon/AdamW + EMA + VICReg run -- composed encode-then-predict residual $\sim 10^{-6}$ after optimisation, not just at initialisation, and under any optimiser. [B] One-step error is flat to five digits across the group, while a same-hypothesis-class non-equivariant baseline fits the slice but breaks out-of-distribution (VN $\times 1.00$ vs baseline $\times 13.8$ in 2D, $\times 17.2$ in 3D, $\times 157$ over the full $\mathrm{SE}(3)$ ladder), with the equivariant model $4.5$-$7.4\times$ smaller. [C] The same isometry argument lifts to closed loop: under a matching equivariant planner the control trajectory at orientation $g$ is exactly $ρ(g)$ applied to the seen one, so closed-loop error is invariant across the group -- float-floor-exact in 2D/$\mathrm{SO}(2)$ on real PushT and statistically flat in 3D/$\mathrm{SE}(3)$ (disjoint 95% CIs). We stress-test the prior against Sutton's Bitter Lesson: augmentation, brute-force scale, and soft-equivariance each close at most the across-group task metric, never the float-floor exactness. Because equivariance is closed under composition, the $H$-fold rollout stays flat ($\times 1.00$, $\le 2\times 10^{-7}$) at every horizon, while the baseline's residual compounds with $H$. Out of scope: task-success sweeps, planner-free invariance, and scaling.
Abstract（参考訳）: 等変エンコーダ$E$と等変予測器$f$で構築された潜在世界モデルは、その訓練損失の証明可能な対称性を継承する:世界の力学が真に、直交表現$ρ(g)$でラテントに作用する群$G$を真に持つとき、一段階予測relMSEは、そのグループ全体で正確に不変であるので、制限された向きのスライスに力学を適合させることは、数学的に軌道上でそれを決定する。このエンドツーエンドをラップトップスケール(CPU/MPS、完全シード)で検証します。 A] 対称性は、実際のMuon/AdamW + EMA + VICReg ラン -- 初期化だけでなく、任意のオプティマイザの下でも、最適化後のエンコードthen予測残差 $\sim 10^{-6}$ で生き残る。 [B] 1ステップ誤差はグループ全体で5桁に平坦で、同じハイブリッドクラスでないベースラインはスライスに適合するが、分配できない(VN $\times 1.00$ vs baseline $\times 13.8$ in 2D, $\times 17.2$ in 3D, $\times 157$ over over the full $\mathrm{SE}(3)$ ladder)。 [C] 同じ等長引数は閉ループに持ち上げられる: 一致する同変プランナーの下では、向きの制御軌跡$g$は正確に$ρ(g)$であるので、閉ループ誤差は群全体に不変である --float-floor-exact in 2D/$\mathrm{SO}(2)$ on real PushT and statisticsly flat in 3D/$\mathrm{SE}(3)$ (disjoint 95% CIs)。我々は、Sutton's Bitter Lesson: augmentation, brute-force scale, and soft-equivariance each close at most the across-group task metric, never the float-floor exactness。 H$-fold のロールアウトは、すべての水平線において平坦な (\times 1.00$, $\le 2\times 10^{-7}$) であり、ベースラインの残留化合物は$H$である。スコープ外:タスク・サクセス・スイープ、プランナーなしの不変性、スケーリング。

関連論文リスト

Polymorphism Is Rotation: Operational Mechanistic Interpretability from a Two-Layer Transformer to Pythia-70m [0.0]
独立に訓練された変圧器は、均一なランダム回転によって異なる残差ストリームベースで同じ関数を計算する。この現象を多型と呼ぶ:同じ関数、相互に理解不能な内部座標である。この現象は標準的なSAE測度には見えない。
論文参考訳（メタデータ） (2026-05-23T13:37:59Z)
Hidden-State Privacy Has an Empty Middle [51.56484100374058]
すべてのフルランクガウス解放を$O(1)$ Fisher utility で表すと、マハラノビス信号が隠れた幅で直線的に成長する方向を認める。スクラッチからトレーニングされたスプリットメモリトランスフォーマーは、[20, 33]$90MでG_mathrmMahに達し、固定言語損失ペナルティにおいて、30Mから1Bまでの同じ予算のGPTベースラインに対して6ドル～24ドルという優位性を維持する。
論文参考訳（メタデータ） (2026-05-21T20:12:09Z)
Optimal Scalar Quantization for Matrix Multiplication: Closed-Form Density and Phase Transition [50.36362492608702]
乗算前の2つの行列のエントリーワイズスカラー量子化について検討した。我々は、閉形式の最適点密度 [ star(u) propto exp!left(-fracu26right)bigl( (1-2)+2u22bigr), qquad u=fracx_X を求め、相関駆動相転移を証明した。
論文参考訳（メタデータ） (2026-03-20T01:53:44Z)
Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction [57.93371273485736]
我々は、すべての労働者が同一の分布にアクセスする均質な(すなわちd.d.)場合であっても、すべての労働者が非バイアス付き境界 LDeltaepsilon2,$$$$$ のポリ対数的により良いポリ対数を求める集中型分散学習環境を考える。
論文参考訳（メタデータ） (2025-06-30T13:27:39Z)
Leading and beyond leading-order spectral form factor in chaotic quantum many-body systems across all Dyson symmetry classes [8.105213101498085]
ランダム行列理論(RMT)スペクトル相関の出現は,多体連系を周期的に蹴り回した多体系のカオス相に現れることを示す。スペクトル形成因子 (SFF) と$K(t)$ を解析的に計算した。我々の導出は、アンサンブル平均を実現するためにのみランダム位相近似を仮定する。
論文参考訳（メタデータ） (2025-02-06T15:37:18Z)
A General Framework for Robust G-Invariance in G-Equivariant Networks [5.227502964814928]
群同変畳み込みニューラルネットワーク(G$-CNN)におけるロバストなグループ不変性を実現するための一般的な方法を提案する。三重相関の完全性は、強い強靭性を持つ$G$-TC層を与える。この手法の利点を可換群と非可換群の両方に示す。
論文参考訳（メタデータ） (2023-10-28T02:27:34Z)
Variance-Aware Confidence Set: Variance-Dependent Bound for Linear Bandits and Horizon-Free Bound for Linear Mixture MDP [76.94328400919836]
線形バンドイットと線形混合決定プロセス(mdp)に対する分散認識信頼セットの構築方法を示す。線形バンドイットに対しては、$d を特徴次元とする$widetildeo(mathrmpoly(d)sqrt1 + sum_i=1ksigma_i2) が成り立つ。線形混合 MDP に対し、$widetildeO(mathrmpoly(d)sqrtK)$ regret bound を得る。
論文参考訳（メタデータ） (2021-01-29T18:57:52Z)
Sparse sketches with small inversion bias [79.77110958547695]
逆バイアスは、逆の共分散に依存する量の推定を平均化するときに生じる。本研究では、確率行列に対する$(epsilon,delta)$-unbiased estimatorという概念に基づいて、逆バイアスを解析するためのフレームワークを開発する。スケッチ行列 $S$ が密度が高く、すなわちサブガウスのエントリを持つとき、$(epsilon,delta)$-unbiased for $(Atop A)-1$ は $m=O(d+sqrt d/ のスケッチを持つ。
論文参考訳（メタデータ） (2020-11-21T01:33:15Z)
How isotropic kernels perform on simple invariants [0.5729426778193397]
等方性カーネル手法のトレーニング曲線は、学習すべきタスクの対称性に依存するかを検討する。大規模な帯域幅では、$beta = fracd-1+xi3d-3+xi$, where $xiin (0,2)$ がカーネルのストライプを原点とする指数であることを示す。
論文参考訳（メタデータ） (2020-06-17T09:59:18Z)
Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness [151.67113334248464]
我々は、他の攻撃モデルに対してスムースな手法を拡張することは困難であることを示す。我々はCIFARに関する実験結果を示し,その理論を検証した。
論文参考訳（メタデータ） (2020-02-08T22:02:14Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。